1,637 results on '"COMPUTER memory management"'
Search Results
2. Improving Memory Management, Performance with Rust: Why Rust is becoming the programming language of choice for many high-level developers.
- Author
-
Williams, Alex
- Subjects
- *
WEB development , *PROGRAMMING languages , *COMPUTER memory management , *COMPUTER programming , *LEARNING curve - Abstract
Rust is revolutionizing high-performance web service development with its cutting-edge memory safety features, efficient resource management, and speed, proving particularly effective in web environments by combining low-level control with high-level concurrency. The language's advanced ownership model and comprehensive type system prevent memory issues at compile time, boosting both reliability and performance, though it poses challenges such as a steep learning curve and a complex ecosystem. Rust's growing popularity, bolstered by endorsements like Linus Torvalds' and its suitability for enterprise-level applications, stems from its ability to ensure memory management, thread safety, and exceptional performance, despite its initial difficulty for new programmers.
- Published
- 2024
- Full Text
- View/download PDF
3. Computation of Topological Indices of Binary and Ternary Trees using Algorithmic Approach.
- Author
-
Elahi, Kashif, Ahmad, Ali, Asim, Muhammad Ahsan, and Hasni, Roslan
- Subjects
ALGORITHMS ,COMPILERS (Computer programs) ,COMPUTER memory management ,COMPUTER software ,CHEMICAL structure - Abstract
In this paper, algorithms are used to compute distance-based topological indices for the Complete Binary Tree (CBT) and the Complete Ternary Tree (CTT). Computation of distance-based topological indices is complex for varied heights of CBT and CTT. Hence designed algorithms to compute distance between any-to-any vertex made this possible to compute the required topological indices for CBT and CTT. The distance calculator algorithm designed for this study can also be customized in digital chemical structures, mathematical chemistry, network traffic control in wireless networks, search applications, high bandwidth routing, parse construction in compilers, and memory management. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. An Efficient Memory Management for Mobile Operating Systems Based on Prediction of Relaunch Distance.
- Author
-
Jaehwan Lee and Sangoh Park
- Subjects
MOBILE operating systems ,COMPUTER memory management ,MOBILE apps ,ARTIFICIAL neural networks ,PREDICTION models - Abstract
Recently, various mobile apps have included more features to improve user convenience. Mobile operating systems load as many apps into memory for faster app launching and execution. The least recently used (LRU)-based termination of cached apps is a widely adopted approach when free space of the main memory is running low. However, the LRUbased cached app termination does not distinguish between frequently or infrequently used apps. The app launch performance degrades if LRU terminates frequently used apps. Recent studies have suggested the potential of using users' app usage patterns to predict the next app launch and address the limitations of the current least recently used (LRU) approach. However, existing methods only focus on predicting the probability of the next launch and do not consider how soon the app will launch again. In this paper, we present a new approach for predicting future app launches by utilizing the relaunch distance. We define the relaunch distance as the interval between two consecutive launches of an app and propose a memory management based on app relaunch prediction (M2ARP). M2ARP utilizes past app usage patterns to predict the relaunch distance. It uses the predicted relaunch distance to determine which apps are least likely to be launched soon and terminate them to improve the efficiency of the main memory. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
5. Learning-Based Memory Allocation for C++ Server Workloads: Technical Perspective.
- Author
-
Lea, Doug
- Subjects
- *
COMPUTER memory management , *ARTIFICIAL neural networks - Abstract
An introduction to the article focusing on learning-based allocation for C++ server workloads and neural network design and deployment is presented.
- Published
- 2024
- Full Text
- View/download PDF
6. Approximate Content-Addressable Memories: A Review.
- Author
-
Garzón, Esteban, Yavits, Leonid, Teman, Adam, and Lanuzza, Marco
- Subjects
ASSOCIATIVE storage ,COMPUTER memory management ,GENOMICS ,BIG data ,COMPUTING platforms - Abstract
Content-addressable memory (CAM) has been part of the memory market for more than five decades. CAM can carry out a single clock cycle lookup based on the content rather than an address. Thanks to this attractive feature, CAM is utilized in memory systems where a high-speed content lookup technique is required. However, typical CAM applications only support exact matching, as opposed to approximate matching, where a certain Hamming distance (several mismatching characters between a query pattern and the dataset stored in CAM) needs to be tolerated. Recent interest in approximate search has led to the development of new CAM-based alternatives, accelerating the processing of large data workloads in the realm of big data, genomics, and other data-intensive applications. In this review, we provide an overview of approximate CAM and describe its current and potential applications that would benefit from approximate search computing. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
7. Can Rust finally replace C?: A qualitative and quantitative analysis.
- Author
-
Gulati, Aryan
- Subjects
PROGRAMMING languages ,COMMUNITY support ,WORKFLOW management ,COMPUTER memory management ,QUANTITATIVE research - Abstract
The aim of this research paper is to conduct a comprehensive analysis of Rust and C, with the objective of assessing whether Rust can finally replace C as a preferred programming language. The analysis encompasses various aspects such as performance benchmarks, memory management, error handling mechanisms, security vulnerabilities, language syntax and readability, type safety and memory management, tooling and libraries support, development workflows, learning curve and community support, the Rust ecosystem, adoption in various industries, case studies of successful Rust projects, and a comparison of industry trends and language preferences. The findings and analysis highlight the strengths and considerations of each language, providing insights into the potential of Rust to replace C in different domains. [ABSTRACT FROM AUTHOR]
- Published
- 2022
8. Modernization efforts for the R-Matrix code SAMMY.
- Author
-
Wiarda, Dorothea, Arbanas, Goran, Brown, Jesse M., Holcomb, Andrew M., Pigni, Marco T., McDonnel, Jordan, and Chapman, Chris
- Subjects
- *
R-matrices , *NUCLEAR cross sections , *RESONANCE , *DATA analysis , *COMPUTER memory management - Abstract
The R-Matrix code SAMMY [1] is a widely used nuclear data evaluation code focused on the resolved range, which includes corrections for experimental effects. The code is still mostly written in FORTRAN 77 and uses a memory management system suitable for the time of its initial writing in 1984. A modernization effort is underway to update the code to modern software development practices. A continuous-integration testing framework was added to automate the large existing set of test cases. Improvements in memory management were implemented to make the code easier to maintain and enable enhancements. The resonance parameters and covariance information are now stored in C++ objects shared by SAMMY and AMPX [2], which is the processing code that generates nuclear data libraries for SCALE [3]. Further plans include switching to the Evaluated Nuclear Data File (ENDF) reading and writing routines in AMPX because these routines are more robust, easier to maintain, and support more features. Support for the new Generalized Nuclear Database Structure (GNDS) format [4] is also of interest. GNDS will share not only the resonance parameters but also the parameters associated with experimental correction in GNDS. The data are currently available in a binary SAMMY format, and the ability to export them to GNDS would make them more widely available and shareable. The next step will be to use the same resonance processing code at 0K in AMPX and SAMMY as an available formalism. Then, any improvements in the formalism can immediately be tested in SCALE because the reconstruction in AMPX will use the same cross section model. The new data library can then be used for testing using the VALID Benchmark suite [5] or other suitable benchmark suites. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
9. New Leaves: Riffling the History of Digital Pagination.
- Author
-
Eve, Martin Paul
- Subjects
- *
PDF (Computer file format) , *IMAGINATION , *TELECOMMUNICATION , *COMPUTER memory management , *WORLD Wide Web , *VISUAL communication , *TIME perception - Abstract
My thanks to John E. Warnock for insights into the early resistance to the PDF format; to Steven Bagley, David Brailsford, and Jeanine Finn for sources of early criticisms of PDF pagination and trans-media constraint; to Nelson H. F. Beebe and Karl Berry for documentation of TEX and pagination standards therein. While PDF eventually came to dominate the industry, Warnock and Geschke write that they were "surprised" by PDF's "slow growth."[84] In Warnock's view, PDF was widely misunderstood at the time of its inception. However, as compositing became a de-skilled profession[86] and with the "explosive growth of the use of the internet" (Warnock's words), a commensurate success came to Adobe and its PDF format.[87] What is significant is that the creation of writeable pagination within word processor contexts comes well before the advent of disseminable WORM paginated formats such as the PDF. Digital pagination in the form of a PDF introduced a trans-media substitutability to malleable digital surfaces for the first time, even while it brought a read-only paradigm within the page context itself.[71] The initial iteration of PDF, "The Camelot Project", specifically aimed to solve two fundamental problems in the world of computer graphics and typography: 1. [Extracted from the article]
- Published
- 2022
- Full Text
- View/download PDF
10. Garbage Collection as a Joint Venture.
- Author
-
DEGENBAEV, ULAN, LIPPAUTZ, MICHAEL, and PAYER, HANNES
- Subjects
- *
GARBAGE collection (Computer science) , *COMPUTER memory management , *COMPUTER storage devices , *COMPUTER software , *PROGRAMMING languages - Abstract
The article discusses a collaborative approach to reclaiming computer memory in heterogenous software systems through automated memory management using garbage collection.
- Published
- 2019
- Full Text
- View/download PDF
11. Using remote cache service for bazel.
- Author
-
Lam, Alpha
- Subjects
- *
CACHE memory , *CACHE memory equipment , *COMPUTER storage devices , *COMPUTER memory management , *PARALLEL programs (Computer programs) , *PARALLEL processing - Abstract
Save time by sharing and reusing build and test output. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
12. Multithreaded wedge detection method on triangular 3D CAD objects using mesh traversal method.
- Author
-
Kırık, Özkan and Özdemir, Caner
- Subjects
COMPUTER-aided design ,WEDGES ,ALGORITHMS ,COMPUTER memory management ,PARALLEL computers ,MULTICORE processors - Abstract
In this study, a multithreaded method for triangular mesh three-dimensional computer aided design objects is proposed to detect and extract wedges. Wedge detection is time consuming process for such objects that have large number of facets. To take the advantage of parallel computing opportunities, the algorithm is refactored in this study. Scope of variables, memory management and stack use are optimized for efficient use of computational resources. The proposed method is focused to calculation efficiency and performance on multicore / multithreaded processors and it is evaluated with benchmark, complex and realistic objects. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
13. Software Challenges for the Changing Storage Landscape.
- Author
-
WADDINGTON, DANIEL and HARRIS, JIM
- Subjects
- *
DATA warehousing software , *SOFTWARE architecture , *COMPUTER memory management , *KERNEL operating systems , *NONVOLATILE memory - Abstract
The article reports on the need to develop software appropriate for the changes in data storage. Topics include the need to shift from kernel operating systems, the relationship between input-output and memory, and the use of nonvolatile memory modules.
- Published
- 2018
- Full Text
- View/download PDF
14. Project Relate.
- Subjects
- *
DIGITAL storytelling , *WEB design , *COMPUTER-generated imagery , *AUTOMATIC speech recognition , *COMPUTER memory management - Published
- 2024
15. HeteroRefactor: Refactoring for Heterogeneous Computing with FPGA.
- Author
-
Lau, Jason, Sivaraman, Aishwarya, Qian Zhang, Gulzar, Muhammad Ali, Cong, Jason, and Miryung Kim
- Subjects
FIELD programmable analog arrays ,PROGRAMMING languages ,COMPUTER memory management ,ABSTRACTION (Computer science) ,SOFTWARE engineering ,COMPUTER software - Abstract
Heterogeneous computing with field-programmable gate-arrays (FPGAs) has demonstrated orders of magnitude improvement in computing efficiency for many applications. However, the use of such platforms so far is limited to a small subset of programmers with specialized hardware knowledge. High-level synthesis (HLS) tools made significant progress in raising the level of programming abstraction from hardware programming languages to C/C++, but they usually cannot compile and generate accelerators for kernel programs with pointers, memory management, and recursion, and require manual refactoring to make them HLS-compatible. Besides, experts also need to provide heavily handcrafted optimizations to improve resource efficiency, which affects the maximum operating frequency, parallelization, and power efficiency. We propose a new dynamic invariant analysis and automated refactoring technique, called HeteroRefactor. First, HeteroRefactor monitors FPGA-specific dynamic invariants--the required bitwidth of integer and floating-point variables, and the size of recursive data structures and stacks. Second, using this knowledge of dynamic invariants, it refactors the kernel to make traditionally HLS-incompatible programs synthesizable and to optimize the accelerator's resource usage and frequency further. Third, to guarantee correctness, it selectively offloads the computation from CPU to FPGA, only if an input falls within the dynamic invariant. On average, for a recursive program of size 175 LOC, an expert FPGA programmer would need to write 185 more LOC to implement an HLS compatible version, while HeteroRefactor automates such transformation. Our results on Xilinx FPGA show that HeteroRefactor minimizes BRAM by 83% and increases frequency by 42% for recursive programs; reduces BRAM by 41% through integer bitwidth reduction; and reduces DSP by 50% through floating-point precision tuning. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
16. Challenges of Memory Management on Modern NUMA Systems.
- Author
-
GAUD, FABIEN, LEPERS, BAPTISTE, FUNSTON, JUSTIN, DASHTI, MOHAMMAD, FEDOROVA, ALEXANDRA, QUÉMA, VIVIEN, LACHAIZE, RENAUD, and ROTH, MARK
- Subjects
- *
NON-uniform memory access , *CLIENT/SERVER computing , *LINUX operating systems , *COMPUTER memory management , *COMPUTER algorithms - Abstract
This article presents an exploration into non-uniform memory access (NUMA) systems within computer servers. The author discusses the performance features of contemporary NUMA systems in contrast to older generations, describes examples of NUMA features within the Linux operating system, and offers a memory-management algorithm to improve performance and reduce access timing problems.
- Published
- 2015
- Full Text
- View/download PDF
17. Integrating region memory management and tag-free generational garbage collection.
- Author
-
ELSMAN, MARTIN and HALLENBERG, NIELS
- Subjects
- *
COMPUTER memory management , *COMPUTER algorithms , *COMPILERS (Computer programs) , *COMPUTER programmers , *LINUX operating systems - Abstract
We present a region-based memory management scheme with support for generational garbage collection. The scheme features a compile-time region inference algorithm, which associates values with logical regions, and builds on a region type system that deploys region types at runtime to avoid the overhead of write barriers and to support partly tag-free garbage collection. The scheme is implemented in the MLKit Standard ML compiler, which generates native x64 machine code. Besides demonstrating a number of important formal properties of the scheme, we measure the scheme's characteristics, for a number of benchmarks, and compare the performance of the generated executables with the performance of executables generated with the MLton state-of-the-art Standard ML compiler and configurations of the MLKit with and without region inference and generational garbage collection enabled. Although region inference often serves the purpose of generations, combining region inference with generational garbage collection is shown often to be superior to combining region inference with non-generational collection despite the overhead introduced by a larger amount of memory waste, due to region fragmentation. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
18. Understanding the "this" reference in object oriented programming: Misconceptions, conceptions, and teaching recommendations.
- Author
-
Shmallo, Ronit and Ragonis, Noa
- Subjects
COMPUTER science education ,OBJECT-oriented methods (Computer science) ,ENGINEERING students ,COMPUTER memory management ,COMMON misconceptions - Abstract
The paper presents research that aims to expose students' understanding of the this reference in object-oriented programming. The study was conducted with high school students (N = 86) and college engineering students (N = 77). Conceptualization of this reflects an understanding of objects in general and involves aspects of programming variants and programmers' preferences as well. To examine students' conceptions, perceptions, and misconceptions we developed a diagnostic tool that uses this in various contexts, such as in constructors, as a visible parameter, for calling an overloaded constructor in class, or while transiting a non-static method using this to a static one. The detailed analysis revealed difficulties, in both groups of participants, in conceptualizing the meaning of this as the current object and in its various uses in the code. The discussion presents students' conceptions of "what is this", nine misconceptions that we characterized, and answers to our research questions. The conclusion offers recommendations for teaching and learning processes in light of the results obtained. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
19. just add memory.
- Author
-
Ventra, Massimiliano Di and Pershin, Yuriy V.
- Subjects
- *
COMPUTER memory management , *DYNAMIC storage allocation (Computer science) , *DISTRIBUTED shared memory , *COMPUTER storage devices , *HIGH performance processors , *MICROPROCESSORS , *EQUIPMENT & supplies - Abstract
The article focuses on the development of electronic components in memcomputing, which is the use of computers containing superfast memory processing hardware. Topics include the development of processors that store computer memory, technological advancement in electronics, and the development of 'mem' components such as meminductors and memcapacitors.
- Published
- 2015
- Full Text
- View/download PDF
20. El método de memoria interna y su aplicación en sistemas industriales.
- Author
-
Soria Tello, Saturnino, Esparza Mendoza, Francisco Javier, Hernandez Salazar, Luis Rolando, Loya Cabrera, Alejandro Eutimio, and Martínez Acosta, Gabriela Guadalupe
- Subjects
- *
MANUFACTURING processes , *AUTOMATION , *COMPUTER memory management - Abstract
This article aims to check the internal memory method for the explanation of industrial automatic systems that require the application of more than one memory state for their operation. The method gives solution, in a structured and methodical way, to industrial automatic systems of the electrical type. With it, it will be possible to understand, in a simple but standardized way, the operation of industrial systems, which is necessary to apply the results obtained both in the teaching field and in industrial practice. [ABSTRACT FROM AUTHOR]
- Published
- 2020
21. Lightweight memory tracing for hot data identification.
- Author
-
Lee, Yunjae, Kim, Yoonhee, and Yeom, Heon Y.
- Subjects
- *
COMPUTER storage devices , *MEMORY , *DATA management , *IDENTIFICATION , *QUALITY of work life , *COMPUTER memory management - Abstract
The low capacity of main memory has become a critical issue in the performance of systems. Several memory schemes, utilizing multiple classes of memory devices, are used to mitigate the problem; hiding the small capacity by placing data in proper memory devices based on the hotness of the data. Memory tracers can provide such hotness information, but existing tracing tools incur extremely high overhead and the overhead increases as the problem size of a workload grows. In this paper, we propose Daptrace built for tracing memory access with bounded and light overhead. The two main techniques, region-based sampling and adaptive region construction, are utilized to maintain a low overhead regardless of the program size. For evaluation, we trace a wide range of 20 workloads and compared with baseline. The results show that Daptrace has a very small amount of runtime overhead and storage space overhead (1.95% and 5.38 MB on average) while maintaining the tracing quality regardless of the working set size of a workload. Also, a case study on out-of-core memory management exhibits a high potential of Daptrace for optimal data management. From the evaluation results, we can conclude that Daptrace shows great performance on identifying hot memory objects. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
22. Low-overhead dynamic sharing of graphics memory space in GPU virtualization environments.
- Author
-
Gu, Minwoo, Park, Younghun, Kim, Youngjae, and Park, Sungyong
- Subjects
- *
MEMORY , *ALGORITHMS , *COMPUTER memory management , *SPACE , *SHARING , *VIRTUAL machine systems , *SCALABILITY - Abstract
The proliferation of GPU intensive workloads has created a new challenge for low-overhead and efficient GPU virtualization solutions over GPU clouds. gVirt is a full GPU virtualization solution for Intel's integrated GPUs that share system's on-board memory for graphics memory. In order to solve the inherent scalability limitation on the number of simultaneous virtual machines (VM) in gVirt, gScale proposed a dynamic sharing scheme for global graphics memory among VMs by copying the entries in a private graphics translation table (GTT) to a physical GTT along with a GPU context switch. However, copying entries between private GTT and physical GTT often causes significant overhead, which becomes worse when the global graphics memory space shared by each VM is overlapped. This paper identifies that the copy overhead caused by GPU context switch is one of the major bottlenecks in performance improvement and proposes a low-overhead dynamic memory management scheme called DymGPU. DymGPU provides two memory allocation algorithms such as size-based and utilization-based algorithms. While the size-based algorithm allocates memory space based on the memory size required by each VM, the utilization-based algorithm considers GPU utilization of each VM to allocate memory space. DymGPU is also dynamic in the sense that the global graphics memory space used by each VM is rearranged at runtime by periodically checking idle VMs and GPU utilization of each runnable VM. We have implemented our proposed approach in gVirt and confirmed that the proposed scheme reduces GPU context switch time by up to 53% and improved the overall performance of various GPU applications by up to 39%. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
23. Optimised memory allocation for less false abortion and better performance in hardware transactional memory.
- Author
-
Li, Xiuhong and Gulila, Altenbek
- Subjects
- *
ABORTION , *MEMORY , *PERFORMANCES , *COMPUTER memory management , *TRANSACTION systems (Computer systems) , *HARDWARE - Abstract
This paper introduces and tackles a special performance hazard in Hardware Transactional Memory (HTM): false abortion. False abortion causes many unnecessary transaction abortions in HTM and can greatly impact the performance, making HTM not that useful when it is adopted as a fast path for Software Transactional Memory. By introducing a new memory allocator design, we are able to put objects that are likely to be accessed together from different threads into different cache lines and thus avoid conflicts of hardware transactions in different threads. Experiments show that our method can reduce 47% of transaction abortion and achieve a speedup of up to 1.67× (averagely 22%), yet only consume 14% more memory, showing great potential to enhance current HTM technology. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
24. dOCAL: high-level distributed programming with OpenCL and CUDA.
- Author
-
Rasch, Ari, Bigge, Julian, Wrodarczyk, Martin, Schulze, Richard, and Gorlatch, Sergei
- Subjects
- *
COMPUTER memory management , *GRAPHICS processing units , *PARALLEL programming , *COMPUTER storage devices - Abstract
In the state-of-the-art parallel programming approaches OpenCL and CUDA, so-called host code is required for program's execution. Efficiently implementing host code is often a cumbersome task, especially when executing OpenCL and CUDA programs on systems with multiple nodes, each comprising different devices, e.g., multi-core CPU and graphics processing units; the programmer is responsible for explicitly managing node's and device's memory, synchronizing computations with data transfers between devices of potentially different nodes and for optimizing data transfers between devices' memories and nodes' main memories, e.g., by using pinned main memory for accelerating data transfers and overlapping the transfers with computations. We develop distributed OpenCL/CUDA abstraction layer (dOCAL)—a novel high-level C++ library that simplifies the development of host code. dOCAL combines major advantages over the state-of-the-art high-level approaches: (1) it simplifies implementing both OpenCL and CUDA host code by providing a simple-to-use, high-level abstraction API; (2) it supports executing arbitrary OpenCL and CUDA programs; (3) it allows conveniently targeting the devices of different nodes by automatically managing node-to-node communications; (4) it simplifies implementing data transfer optimizations by providing different, specially allocated memory regions, e.g., pinned main memory for overlapping data transfers with computations; (5) it optimizes memory management by automatically avoiding unnecessary data transfers; (6) it enables interoperability between OpenCL and CUDA host code for systems with devices from different vendors. Our experiments show that dOCAL significantly simplifies the development of host code for heterogeneous and distributed systems, with a low runtime overhead. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
25. Robust and efficient memory management in Apache AsterixDB.
- Author
-
Kim, Taewoo, Behm, Alexander, Blow, Michael, Borkar, Vinayak, Bu, Yingyi, Carey, Michael J., Hubail, Murtadha, Jahangiri, Shiva, Jia, Jianfeng, Li, Chen, Luo, Chen, Maxon, Ian, and Pirzadeh, Pouria
- Subjects
COMPUTER memory management ,RELATIONAL databases ,DATABASES ,SHORT-term memory ,CACHE memory ,COMPUTER workstation clusters ,MEMORY - Abstract
Summary: Traditional relational database systems handle data by dividing their memory into sections such as a buffer cache and working memory, assigning a memory budget to each section to efficiently manage a limited amount of overall memory. They also assign memory budgets to memory‐intensive operators such as sorts and joins and control the allocation of memory to these operators; each memory‐intensive operator attempts to maximize its memory usage to reduce disk I/O cost. Implementing such memory‐intensive operators requires a careful design and application of appropriate algorithms that properly utilize memory. Today's Big Data management systems need the ability to handle large amounts of data similarly, as it is unrealistic to assume that truly big data will fit into memory. In this article, we share our memory management experiences in Apache AsterixDB, an open‐source Big Data management software platform that scales out horizontally on shared‐nothing commodity computing clusters. We describe the implementation of AsterixDB's memory‐intensive operators and their designs related to memory management. We also discuss memory management at the global (cluster) level. We conducted an experimental study using several synthetic and real datasets to explore the impact of this work. We believe that future Big Data management system builders can benefit from these experiences. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
26. F-STONE: A Fast Real-Time DDOS Attack Detection Method Using an Improved Historical Memory Management.
- Author
-
Nooribakhsh, Mahsa and Mollamotalebi, Mahdi
- Subjects
DENIAL of service attacks ,COLLECTIVE memory ,INTERNET protocol address ,DATA structures ,COMPUTER memory management ,IP networks - Abstract
Distributed Denial of Service (DDoS) is a common attack in recent years that can deplete the bandwidth of victim nodes by flooding packets. Based on the type and quantity of traffic used for the attack and the exploited vulnerability of the target, DDoS attacks are grouped into three categories as Volumetric attacks, Protocol attacks, and Application attacks. The volumetric attack, which the proposed method attempts to detect it, is the most common type of DDoS attacks. The aim of this paper is to reduce the delay of real-time detection of DDoS attacks utilizing hybrid structures based on data stream algorithms. The proposed data structure (BHM1) improves the data storing mechanism presented in the STONE method and consequently reduces the detection time. STONE characterizes regular network traffic of a service by aggregating it into common prefixes of IP addresses, and detecting attacks when the aggregated traffic deviates from the regular one. In BHM, history refers to the output traffic information obtained from each monitoring period to form a reference profile. The reference profile is created by employing historical information and only includes normal traffic information. The delay of DDoS attack detection increases in STONE due to long-time intervals between each monitoring period. The proposed method (F-STONE) has been compared to STONE based on attack detection time, Expected Profile Update Time (EPUT), and rate of attack detection. The evaluation results indicated significant improvements in terms of the EPUT, acceleration of attack detection, and reduction of false positive rate. [ABSTRACT FROM AUTHOR]
- Published
- 2020
27. Kilobyte Virtual Machine Memory Management.
- Author
-
Aboubacar, Adamou Souleymane
- Subjects
VIRTUAL machine systems ,EMBEDDED computer systems ,COMPUTER memory management ,COMPUTER storage devices ,COMPUTER algorithms - Abstract
Java Virtual Machines use various automatic garbage collector algorithms to manage memory in the object lifecycle. However, these algorithms have some drawbacks. In order to improve embedded systems memory management, we implement a new garbage collector algorithm on Kilobyte Virtual Machine. This algorithm apply the mark-sweep and compaction of the memory garbage collection in pages most used in the recent past, and the mark-sweep garbage collection without compacting the memory in pages used the least in the near past. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
28. Orthogonal persistence in nonvolatile memory architectures: A persistent heap design and its implementation for a Java Virtual Machine.
- Author
-
Perez, Taciano D., Neves, Marcelo V., Medaglia, Diego, Monteiro, Pedro H. G., and De Rose, César A. F.
- Subjects
NONVOLATILE memory ,PROGRAMMING languages ,DATA warehousing ,ARCHITECTURE ,COMPUTER systems ,RANDOM access memory ,COMPUTER memory management - Abstract
Summary: Current computer systems separate main memory from storage, and programming languages typically reflect this distinction using different representations for data in memory and storage. However, moving data back and forth between these different layers and representations compromise both programming and execution efficiency. To remedy this, the concept of orthogonal persistence (OP) was proposed in the early 1980s advocating that, from a programmer's standpoint, there should be no differences in the way that short‐term and long‐term data are manipulated. However, at that time, the underlying implementations still had to cope with the complexity of moving data across memory and storage. Today, recent nonvolatile memory (NVM) technologies, such as resistive RAM and phase‐change memory, allow main memory and storage to be collapsed into a single layer of persistent memory, opening the way for more efficient programming abstractions for handling persistence. In this work, we revisit OP concepts in the context of NVM architectures and propose a persistent heap design for languages with automatic memory management. We demonstrate how it can significantly increase programmer and execution efficiency, removing the impedance mismatch of crossing semantic boundaries. To validate and demonstrate the presented concepts, we present JaphaVM, an implementation of the proposed design based on JamVM, an open‐source Java Virtual Machine. Our results show that JaphaVM, in most cases, executes the same operations between one and two orders of magnitude faster than regular database‐based and file‐based implementations, while requiring significantly less lines of code. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
29. Hotness- and Lifetime-Aware Data Placement and Migration for High-Performance Deep Learning on Heterogeneous Memory Systems.
- Author
-
Han, Myeonggyun, Hyun, Jihoon, Park, Seongbeom, and Baek, Woongki
- Subjects
- *
DEEP learning , *COMPUTER memory management , *DYNAMIC random access memory , *MEMORY , *SYSTEMS software , *RANDOM access memory - Abstract
Heterogeneous memory systems that comprise memory nodes with disparate architectural characteristics (e.g., DRAM and high-bandwidth memory (HBM)) have surfaced as a promising solution in a variety of computing domains ranging from embedded to high-performance computing. Since deep learning (DL) is one of the most widely-used workloads in various computing domains, it is crucial to explore efficient memory management techniques for DL applications that execute on heterogeneous memory systems. Despite extensive prior works on system software and architectural support for efficient DL, it still remains unexplored to investigate heterogeneity-aware memory management techniques for high-performance DL on heterogeneous memory systems. To bridge this gap, we analyze the characteristics of representative DL workloads on a real heterogeneous memory system. Guided by the characterization results, we propose HALO, hotness- and lifetime-aware data placement and migration for high-performance DL on heterogeneous memory systems. Through quantitative evaluation, we demonstrate the effectiveness of HALO in that it significantly outperforms various memory management policies (e.g., 28.2 percent higher performance than the HBM-Preferred policy) supported by the underlying system software and hardware, achieves the performance comparable to the ideal case with infinite HBM, incurs small performance overheads, and delivers high performance across a wide range of application working-set sizes. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
30. Huge Page Friendly Virtualized Memory Management.
- Author
-
Sha, Sai, Hu, Jing-Yuan, Luo, Ying-Wei, Wang, Xiao-Lin, and Wang, Zhenlin
- Subjects
COMPUTER memory management ,PHASE change memory ,MEMORY ,DYNAMIC programming ,MACHINE performance - Abstract
With the rapid increase of memory consumption by applications running on cloud data centers, we need more efficient memory management in a virtualized environment. Exploiting huge pages becomes more critical for a virtual machine's performance when it runs large working set size programs. Programs with large working set sizes are more sensitive to memory allocation, which requires us to quickly adjust the virtual machine's memory to accommodate memory phase changes. It would be much more efficient if we could adjust virtual machines' memory at the granularity of huge pages. However, existing virtual machine memory reallocation techniques, such as ballooning, do not support huge pages. In addition, in order to drive effective memory reallocation, we need to predict the actual memory demand of a virtual machine. We find that traditional memory demand estimation methods designed for regular pages cannot be simply ported to a system adopting huge pages. How to adjust the memory of virtual machines timely and effectively according to the periodic change of memory demand is another challenge we face. This paper proposes a dynamic huge page based memory balancing system (HPMBS) for efficient memory management in a virtualized environment. We first rebuild the ballooning mechanism in order to dispatch memory in the granularity of huge pages. We then design and implement a huge page working set size estimation mechanism which can accurately estimate a virtual machine's memory demand in huge pages environments. Combining these two mechanisms, we finally use an algorithm based on dynamic programming to achieve dynamic memory balancing. Experiments show that our system saves memory and improves overall system performance with low overhead. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
31. Tracing and Profiling Machine Learning Dataflow Applications on GPU.
- Author
-
Zins, Pierre and Dagenais, Michel
- Subjects
- *
MACHINE learning , *GRAPH algorithms , *HUMAN facial recognition software , *COMPUTER memory management , *SIGNAL processing , *PAPER arts , *GRAPHICS processing units - Abstract
In this paper, we propose a profiling and tracing method for dataflow applications with GPU acceleration. Dataflow models can be represented by graphs and are widely used in many domains like signal processing or machine learning. Within the graph, the data flows along the edges, and the nodes correspond to the computing units that process the data. To accelerate the execution, some co-processing units, like GPUs, are often used for computing intensive nodes. The work in this paper aims at providing useful information about the execution of the dataflow graph on the available hardware, in order to understand and possibly improve the performance. The collected traces include low-level information about the CPU, from the Linux Kernel (system calls), as well as mid-level and high-level information respectively about intermediate libraries like CUDA, HIP or HSA, and the dataflow model. This is followed by post-mortem analysis and visualization steps in order to enhance the trace and show useful information to the user. To demonstrate the effectiveness of the method, it was evaluated for TensorFlow, a well-known machine learning library that uses a dataflow computational graph to represent the algorithms. We present a few examples of machine learning applications that can be optimized with the help of the information provided by our proposed method. For example, we reduce the execution time of a face recognition application by a factor of 5X. We suggest a better placement of the computation nodes on the available hardware components for a distributed application. Finally, we also enhance the memory management of an application to speed up the execution. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
32. Efficient State Retention through Paged Memory Management for Reactive Transient Computing.
- Author
-
Sliper, Sivert T., Balsamo, Domenico, Nikoleris, Nikos, Wang, William, Weddell, Alex S., and Merrett, Geoff V.
- Subjects
NONVOLATILE memory ,MICROCONTROLLERS ,INTERNET of things ,DATA encryption ,COMPUTER memory management - Abstract
Reactive transient computing systems preserve computational progress despite frequent power failures by suspending (saving state to nonvolatile memory) when detecting a power failure, and restoring once power returns. Existing methods inefficiently save and restore all allocated memory. We propose lightweight memory management that applies the concept of paging to load pages only when needed, and save only modified pages. We then develop a model that maximises available execution time by dynamically adjusting the suspend and restore voltage thresholds. Experiments on an MSP430FR5994 microcontroller show that our method reduces state retention overheads by up to 86.9% and executes algorithms up to 5.3x faster than the state-of-the-art. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
33. A Wear-Leveling-Aware Fine-Grained Allocator for Non-Volatile Memory.
- Author
-
Xianzhang Chen, Zhuge Qingfeng, Qiang Sun, Sha, Edwin H.-M., Shouzhen Gu, Chaoshu Yang, and Chun Jason Xue
- Subjects
NONVOLATILE memory ,LINUX operating systems ,RESOURCE allocation ,METADATA ,COMPUTER memory management - Abstract
Emerging non-volatile memories (NVMs) are promising main memory for their advanced characteristics. However, the low endurance of NVM cells makes them vulnerable to frequent fine-grained updates. This paper proposes a Wear-leveling Aware Fine-grained Allocator (WAFA) for NVM. WAFA divides pages into basic memory units to support fine-grained updates. WAFA allocates the basic memory units of a page in a rotational manner to distribute fine-grained updates evenly on memory cells. The fragmented basic memory units of each page caused by the memory allocation and deallocation operations are reorganized by reform operation. We implement WAFA in Linux kernel 4.4.4. Experimental results show that WAFA can reduce 81.1% and 40.1% of the total writes of pages over NVMalloc and nvm_alloc, the state-of-the-art wear-conscious allocator for NVM. Meanwhile, WAFA shows 48.6% and 42.3% performance improvement over NVMalloc and nvm_alloc, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
34. Structured Deferral: Synchronization via Procrastination.
- Author
-
MCKENNEY, PAUL E.
- Subjects
- *
SYNCHRONIZATION , *SOFTWARE architecture , *TIME measurements , *COMPUTER software , *COMPUTER software development , *COMPUTER memory management - Abstract
The article discusses synchronization via procrastination as a structured deferral in software design. Topics covered include lazy approaches like reference counting, garbage collection and lazy evaluation, the significant time needed to detect hardware failure, and the usefulness of synchronization via procrastination when interacting with external state. Also mentioned are the key insight behind hazard pointers, the shortcomings inherent in the strengths of hazard pointers, and the read-side disagreements on current state.
- Published
- 2013
- Full Text
- View/download PDF
35. Performance evaluation of Unified Memory with prefetching and oversubscription for selected parallel CUDA applications on NVIDIA Pascal and Volta GPUs.
- Author
-
Knap, Marcin and Czarnul, Paweł
- Subjects
- *
PERFORMANCE evaluation , *MEMORY , *DYNAMIC simulation , *IMAGE processing , *COMPUTER memory management - Abstract
The paper presents assessment of Unified Memory performance with data prefetching and memory oversubscription. Several versions of code are used with: standard memory management, standard Unified Memory and optimized Unified Memory with programmer-assisted data prefetching. Evaluation of execution times is provided for four applications: Sobel and image rotation filters, stream image processing and computational fluid dynamic simulation, performed on Pascal and Volta architecture GPUs—NVIDIA GTX 1080 and NVIDIA V100 cards. Furthermore, we evaluate the possibility of allocating more memory than available on GPUs and assess performance of codes using the three aforementioned implementations, including memory oversubscription available in CUDA. Results serve as recommendations and hints for other similar codes regarding expected performance on modern and already widely available GPUs. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
36. An expert system for checking the correctness of memory systems using simulation and metamorphic testing.
- Author
-
Cañizares, Pablo C., Núñez, Alberto, and de Lara, Juan
- Subjects
- *
EXPERT systems , *COMPUTER memory management , *MEMORY testing , *COMPUTER performance , *SIMULATION methods & models , *COMPUTING platforms , *COMPUTER software correctness - Abstract
• A novel expert system for checking the correctness of memory systems. • The expert system properly combines simulation and metamorphic testing. • The expert system automatically generates test cases to check memory models. • Mutation testing has been applied to check the effectiveness of the ES. • The ES provides promising results, detecting 99. During the last few years, computer performance has reached a turning point where computing power is no longer the only important concern. This way, the emphasis is shifting from an exclusive focus on the optimisation of the computing system to optimising other systems, like the memory system. Broadly speaking, testing memory systems entails two main challenges: the oracle problem and the reliable test set problem. The former consists in deciding if the outputs of a test suite are correct. The latter refers to providing an appropriate test suite for determining the correctness of the system under test. In this paper we propose an expert system for checking the correctness of memory systems. In order to face these challenges, our proposed system combines two orthogonal techniques – simulation and metamorphic testing – enabling the automatic generation of appropriate test cases and deciding if their outputs are correct. In contrast to conventional expert systems, our system includes a factual database containing the results of previous simulations, and a simulation platform for computing the behaviour of memory systems. The knowledge of the expert is represented in the form of metamorphic relations, which are properties of the analysed system involving multiple inputs and their outputs. Thus, the main contribution of this work is two-fold: a method to automatise the testing process of memory systems, and a novel expert system design focusing on increasing the overall performance of the testing process. To show the applicability of our system, we have performed a thorough evaluation using 500 memory configurations and 4 different memory management algorithms, which entailed the execution of more than one million of simulations. The evaluation used mutation testing, injecting faults in the memory management algorithms. The developed expert system was able to detect over 99% of the critical injected faults, hence obtaining very promising results, and outperforming other standard techniques like random testing. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
37. Cross-state events: A new approach to parallel discrete event simulation and its speculative runtime support.
- Author
-
Pellegrini, Alessandro and Quaglia, Francesco
- Subjects
- *
DISCRETE event simulation , *TERRITORIAL partition , *COMPUTER memory management - Abstract
We present a new approach to Parallel Discrete Event Simulation (PDES), where we enable the execution of so-called cross-state events. During their processing, the state of multiple concurrent simulation objects can be accessed in read/write mode, as opposed to classical partitioned accesses. This is done with no pre-declaration of this type of access by the programmer, hence also coping with non-determinism. In our proposal, cross-state events are supported by a speculative runtime environment fully transparently to the application code. This is done through an ad-hoc memory management architecture and an extension of the classical Time Warp synchronization protocol. This extension, named Event and Cross-State (ECS) synchronization, ensures causally-consistent speculative parallel execution of discrete event applications by allowing all events to observe the snapshot of the model execution trajectory that would have been observed in a timestamp-ordered execution of the same model. An experimental assessment of our proposal shows how it can significantly reduce the application development complexity, while also providing advantages in terms of performance. • Introduction of a new concept of "event" in parallel discrete event simulation, called cross-state event. • Introduction of a fully transparent runtime support for parallel discrete event simulation with cross-state events. • Achievements of advantages in terms of both performance and programmability of parallel discrete event simulation models. • Enabling of a fully innovative hybrid model-execution environment based on a mix of state partitioning and state sharing across concurrent simulation objects. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
38. Hierarchical Hybrid Memory Management in OS for Tiered Memory Systems.
- Author
-
Liu, Lei, Yang, Shengjie, Peng, Lu, and Li, Xinyu
- Subjects
- *
COMPUTER memory management , *MEMORY , *RANDOM access memory , *ENERGY consumption - Abstract
The emerging hybrid DRAM-NVM architecture is challenging the existing memory management mechanism at the level of the architecture and operating system. In this paper, we introduce Memos, a memory management framework which can hierarchically schedule memory resources over the entire memory hierarchy including cache, channels, and main memory comprising DRAM and NVM simultaneously. Powered by our newly designed kernel-level monitoring module that samples the memory patterns by combining TLB monitoring with page walks, and page migration engine, Memos can dynamically optimize the data placement in the memory hierarchy in response to the memory access pattern, current resource utilization, and memory medium features. Our experimental results show that Memos can achieve high memory utilization, improving system throughput by around 20.0 percent; reduce the memory energy consumption by up to 82.5 percent; and improve the NVM lifetime by up to 34X. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
39. Learning memory management with C‐Sim: A C‐based visual tool.
- Author
-
Perez‐Schofield, Baltasar García, Rivera, Matías García, Ortin, Francisco, and Lado, María J.
- Subjects
ORGANIZATIONAL learning ,CONCEPT learning ,COMPUTER science ,PROGRAMMING languages ,KNOWLEDGE management ,COMPUTER memory management - Abstract
Nowadays, Computer Science (CS) students must cope with continuous challenges related to programming skill acquisition. In some occasions, they have to deal with the internals of memory management (pointers, pointer arithmetic, and heap management) facing a vision of programming from the low abstraction level offered by C. Even using C++ and references, not all scenarios where objects or collections of objects need to be managed can be covered. Based on the difficulties identified when dealing with such low‐level abstractions, the C‐Sim application, aimed at learning these concepts in an easy way, has been developed. To support the tool, the C programming language was selected. It allows showing concepts, remaining as close as possible both to the hardware and the operating system. To validate C‐Sim, pre and posttests were filled in by a group of 60 first‐year CS students, who employed the tool to learn about memory management. Grades of students using C‐Sim were also obtained and compared with those that did not use the tool the former academic year. As main outcomes, 82.26% of students indicated that they had improved programming and memory management knowledge, and 83.64% pointed out that the use of this type of tools improves the understanding and quality of the practice lessons. Furthermore, the marks of students have significantly increased. Finally, C‐Sim was designed from the ground up as a learning aid and can be useful for lecturers, who can complement their lessons using interactive demonstrations. Students can also employ it to experiment and learn autonomously. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
40. A survey of optimization techniques for thermal-aware 3D processors.
- Author
-
Cao, Kun, Zhou, Junlong, Wei, Tongquan, Chen, Mingsong, Hu, Shiyan, and Li, Keqin
- Subjects
- *
INTEGRATED circuits , *MATHEMATICAL optimization , *COMPUTER memory management , *PRODUCTION scheduling , *ELECTRONIC circuits - Abstract
Interconnect scaling has become a major design challenge for traditional planar (2D) integrated circuits (ICs). Three-dimensional (3D) IC that stacks multiple device layers through 3D stacking technology is regarded as an effective solution to this dilemma. A promising 3D IC design direction is to construct 3D processors. However, 3D processors are likely to suffer from more serious thermal issues as compared to conventional 2D processors, which may hinder the employment or even offset the benefits of 3D stacking. Therefore, thermal-aware design techniques should be adopted to alleviate the thermal problems with 3D processors. In this survey, we review works on system level optimization techniques for thermal-aware 3D processor design from hierarchical perspectives of architecture, floorplanning, memory management, and task scheduling. We first survey 3D processor architectures to demonstrate how a 3D processor can be constructed by using 3D stacking technology, and present an overview of thermal characteristics of the constructed 3D processors. We then review thermal-aware floorplanning, memory management and task scheduling techniques to show how the thermal impact on 3D processor performance can be reduced. A systematic classification method is utilized throughout the survey to emphasize similarities and differences of various thermal-aware 3D processor optimization techniques. This paper shows that the thermal impact on 3D processors is manageable by adopting thermal-aware techniques, thus making 3D processors into the mainstream in the near future. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
41. Eliminating object reference checks by escape analysis on real-time Java virtual machine.
- Author
-
Feng, Wei, Shi, Xiaohua, and Wang, Wenru
- Subjects
- *
JAVA programming language , *REAL-time programming , *PROGRAMMING languages , *PROCESS optimization , *ESCAPES , *COMPUTER memory management - Abstract
The real-time specification for Java (RTSJ) makes Java to be a real-time programming language. However, the RTSJ's memory management system is more complicated than J2SE's. The assignment rules of RTSJ, which prevent the creation of dangling references, must be checked by real-time Java virtual machines (JVMs) at run-time. These frequent run-time object reference checks introduce significant time overheads and unpredictable execution time, which has great impact on real-time systems. This paper presents an equivalence class based, context sensitive and flow insensitive escape analysis algorithm that effectively eliminates unnecessary run-time reference checkpoints of RTSJ programs. The optimization framework has been implemented in an open-source real-time JVM namely jRate and evaluated by CD x , a relative authority real-time Java benchmark suite. The results show that this optimization algorithm eliminates more than 90% static reference checkpoints, removes about 50% run-time reference checkpoints on average , and improves the run-time performance of average 3.13%, max 8.93%. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
42. A 3-D Scene Management Method Based on the Triangular Mesh for Large-Scale Web3D Scenes.
- Author
-
Luo, Lihong and Yang, Xian
- Subjects
GEOGRAPHIC information systems ,THREE-dimensional display systems ,DYNAMIC loads ,VIRTUAL reality ,COMPUTER memory management - Abstract
Real-time rendering of large-scale Web3D scenes was difficult to implement in virtual reality systems and geographic information system (GIS) in the past because of the technical constraints in CPU, memory, and network bandwidth. In this paper, a model management strategy was proposed based on triangular meshes, in which neighborhood buildings are considered as nodes and connected. Each node in the mesh has a set of level of detail (LOD) models, including high, medium, and low precision models. Besides a model file, the high precision LOD of the node can be a subtriangular mesh as well. The three-dimensional (3-D) models in a complex scene can be flexibly managed with some nested triangular meshes. The memory management strategy and the display management strategy were discussed in this paper. According to the experimental results, the proposed method effectively achieves the progressive downloading, dynamic loading, and real-time display for a large-scale 3-D scene. Its performance is better than the traditional methods. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
43. Energy-Aware Motion and Disparity Estimation System for 3D-HEVC With Run-Time Adaptive Memory Hierarchy.
- Author
-
Afonso, Vladimir, Conceicao, Ruhan, Saldanha, Mario, Braatz, Luciano, Perleberg, Murilo, Correa, Guilherme, Porto, Marcelo, Agostini, Luciano, Zatt, Bruno, and Susin, Altamiro
- Subjects
- *
COMPUTER memory management , *ENERGY consumption of computers , *VIDEO coding , *MEMORY hierarchy (Computer science) , *EMBEDDED computer systems - Abstract
The popularization of multimedia services has pushed forward the development of 2D/3D video-capable embedded mobile devices. Such devices require efficient energy/memory-management strategies to deal with severe memory/processing requirements and limited energy supply. Therefore, we propose a motion and disparity estimation (ME and DE) system—the most memory/processing demanding encoding steps—for the 3D High Efficiency Video Coding (3D-HEVC) standard. It was designed for low energy consumption, featuring a run-time adaptive memory hierarchy. The processing unit employs flexible coding order and optimizations to reduce the computational effort by exploring the inter-channel and inter-view redundancies. The memory hierarchy features window-based prefetching, data reuse, subsampling, and dynamic voltage scaling controlled by our depth-based dynamic search window resizing algorithm. Memory results demonstrate an average on-chip energy reduction of 79% in comparison to the widely used Level-C solution for a 45-nm technology. The proposed energy-aware ME and DE system dissipates 7.55 W while processing three HD 1080p views (video + depth) at 30 frames per second and presents a mean energy consumption of 0.107 J per access unit. To the best of our knowledge, this is the first work that proposes a real-time ME/DE system for the 3D-HEVC standard with an adaptive memory hierarchy. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
44. Modeling Non-Uniform Memory Access on Large Compute Nodes with the Cache-Aware Roofline Model.
- Author
-
Denoyelle, Nicolas, Goglin, Brice, Ilic, Aleksandar, Jeannot, Emmanuel, and Sousa, Leonel
- Subjects
- *
BANDWIDTHS , *COMPUTER systems , *DEBUGGING , *ARTIFICIAL intelligence , *COMPUTER memory management - Abstract
NUMA platforms, emerging memory architectures with on-package high bandwidth memories bring new opportunities and challenges to bridge the gap between computing power and memory performance. Heterogeneous memory machines feature several performance trade-offs, depending on the kind of memory used, when writing or reading it. Finding memory performance upper-bounds subject to such trade-offs aligns with the numerous interests of measuring computing system performance. In particular, representing applications performance with respect to the platform performance bounds has been addressed in the state-of-the-art Cache-Aware Roofline Model (CARM) to troubleshoot performance issues. In this paper, we present a Locality-Aware extension (LARM) of the CARM to model NUMA platforms bottlenecks, such as contention and remote access. On top of this, the new contribution of this paper is the design and validation of a novel hybrid memory bandwidth model. This new hybrid model quantifies the achievable bandwidth upper-bound under above-described trade-offs with less than 3 percent error. Hence, when comparing applications performance with the maximum attainable performance, software designers can now rely on more accurate information. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
45. HiNextApp: A context-aware and adaptive framework for app prediction in mobile systems.
- Author
-
Liu, Duo, Xiang, Chaoneng, Li, Shiming, Ren, Jinting, Liu, Renping, Liang, Liang, Guan, Yong, and Chen, Xianzhang
- Subjects
MOBILE apps ,COMPUTER memory management - Abstract
• We perform a detailed analysis of mobile user behaviors. • We propose an app-prediction framework to effectively improve the prediction accuracy and reduce the training costs. • We conduct evaluations to verify the effectiveness of HiNextApp and analyze the overhead of the framework. A variety of applications (App) installed on mobile systems such as smartphones enrich our lives, but make it more difficult to the system management. For example, finding the specific Apps becomes more inconvenient due to more Apps installed on smartphones, and App response time could become longer because of the gap between more, larger Apps and limited memory capacity. Recent work has proposed several methods of predicting next used Apps in the immediate future (here in after app-prediction) to solve the issues, but faces the problems of the low prediction accuracy and high training costs. Especially, applying app-prediction to memory management (such as LMK) and App prelaunching has high requirements for the prediction accuracy and training costs. In this paper, we propose an app-prediction framework, named HiNextApp , to improve the app-prediction accuracy and reduce training costs in mobile systems. HiNextApp is based on contextual information, and can adjust the size of prediction periods adaptively. The framework mainly consists of two parts: non-uniform Bayes model and an elastic algorithm. The experimental results show that HiNextApp can effectively improve the prediction accuracy and reduce training times. Besides, compared with traditional Bayes model, the overhead of our framework is relatively low. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
46. Revive Bad Flash-Memory Pages by HLC Scheme.
- Author
-
Lin, Han-Yi and Hsieh, Jen-Wei
- Subjects
- *
COMPUTER memory management , *FLASH memory , *ERROR correction (Information theory) , *ERROR-correcting codes , *NAND gates - Abstract
In recent years, flash memory has been widely used in embedded systems, portable devices, and high-performance storage products due to its nonvolatility, shock resistance, low power consumption, and high performance natures. To reduce the product cost, multi-level-cell (MLC) flash memory has been proposed; compared with the traditional single-level-cell (SLC) flash memory that only stores one bit of data per cell, each MLC cell can store two or more bits of data. Thus MLC can achieve a larger capacity and reduce the cost per unit. However, MLC also suffers from the degradation in both performance and reliability. In this paper, we try to enhance the reliability and reduce the product cost of flash-memory-based solid-state drive (SSD) from a totally different perspective. We propose the half-level-cell (HLC) scheme to manage and reuse the worn-out space in SSD; through our management scheme, the system can treat two bad pages as a normal page without sacrificing performance and reliability. The proposed scheme is purely on software/firmware-level, thus there is no need to change the hardware. The experiment results show that the lifetime of SSD with our proposed HLC scheme can be extended to 50.56% under the Windows workload and up to 65.45% under the multimedia workload. When we apply the HLC scheme to flash-memory cache of hybrid storage systems, the response time can be improved up to 20.57%. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
47. Model-based prediction of automatic memory management and garbage collection behavior.
- Author
-
Willnecker, Felix and Krcmar, Helmut
- Subjects
- *
REFUSE collection , *COMPUTER memory management , *ARCHITECTURE evaluation , *CAPACITY requirements planning , *MEMORY - Abstract
Abstract Performance models focus on resource consumption and the effects of CPU, network, or hard-disk utilization. These resources usually have the largest effect on the response times and throughput of an application. However, deficient memory management can have severe effects on an application and its runtime, such as overlong response times or even crashes. As memory management has been disregarded in performance simulations, we address this gap with an approach based on memory measurements and derived metrics to predict the behavior of this resource and the effects on the CPU. Although numerous works exist that analyze memory management and especially garbage collections, accurate prediction models are rare. We demonstrate the automatic extraction of memory behavior using a performance model generator. Furthermore, the approach is evaluated using the SPECjEnterprise2010 and the SPECjEnterpriseNEXT industry benchmark, using different resource environments, garbage collection algorithms, and workloads. This work demonstrates that a certain set of probabilities allows one to create a memory profile for an architecture and predict the behavior of the memory management. The results of such predictions can be used for better capacity planning (on-premise), cost-prediction (cloud), architecture evaluation and optimization, or memory profiling. This approach allows for a continuous model-based evaluation of an enterprise architecture regarding its memory footprint. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
48. Compact and Flexible FPGA Implementation of Ed25519 and X25519.
- Author
-
TURAN, FURKAN and VERBAUWHEDE, INGRID
- Subjects
COMPUTER memory management ,ELLIPTIC curve cryptography ,FINITE state machines ,DIGITAL signal processing ,SYSTEMS on a chip ,GATE array circuits ,ELLIPTIC curves - Abstract
This article describes a field-programmable gate array (FPGA) cryptographic architecture, which combines the elliptic curve--based Ed25519 digital signature algorithm and the X25519 key establishment scheme in a single module. Cryptographically, these are high-security elliptic curve cryptography algorithms with short key sizes and impressive execution times in software. Our goal is to provide a lightweight FPGA module that enables them on resource-constrained devices, specifically for Internet of Things (IoT) applications. In addition, we aim at extensibility with customisable countermeasures against timing and differential power analysis side-channel attacks and fault-injection attacks. For the former, we offer a choice between time-optimised versus constant-time execution, with or without Z-coordinate randomisation and base-point blinding; and for the latter, we offer enabling or disabling default-case statements in the Finite State Machine (FSM) descriptions. To obtain compactness and at the same time fast execution times, we make maximum use of the Digital Signal Processing (DSP) slices on the FPGA. We designed a single arithmetic unit that is flexible to support operations with two moduli and non-modulus arithmetic. In addition, our design benefits in-place memory management and the local storage of inputs into DSP slices' pipeline registers and takes advantage of distributed memory. These eliminate a memory access bottleneck. The flexibility is offered by a microcode supported instruction-set architecture. Our design targets 7-Series Xilinx FPGAs and is prototyped on a Zynq System-on-Chip (SoC). The base design combining Ed25519 and X25519 in a single module, and its implementation requires only around 11.1K Lookup Tables (LUTs), 2.6K registers, and 16 DSP slices. Also, it achieves performance of 1.6ms for a signature generation and 3.6ms for a signature verification for a 1024-bit message with an 82MHz clock. Moreover, the design can be optimised only for X25519, which gives the most compact FPGA implementation compared to previously published X25519 implementations. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
49. A Thermal-Aware Physical Space Reallocation for Open-Channel SSD With 3-D Flash Memory.
- Author
-
Wang, Yi, Zhang, Mingxu, Yang, Xuan, and Li, Tao
- Subjects
- *
FLASH memory , *THRESHOLD voltage , *REFUSE collection , *RESOURCE management , *COMPUTER architecture , *COMPUTER memory management - Abstract
3-D flash memory faces a number of challenges, including thermal issues and process variation. The high temperature will cause charge loss and lead to the fluctuation of threshold voltage. To address the thermal issue of 3-D flash memory, this paper presents ThermAlloc, a novel thermal-aware physical space allocation strategy for open-channel solid-state drive with 3-D charge trapping flash memory. ThermAlloc permutes the allocation of physical blocks. Consecutively accessed logical blocks are distributed to different physical locations in order to prevent the accumulation of hotspots. The objective is to postpone garbage collection operations and keep the distribution of block temperature well under control. We demonstrate the viability of the proposed technique using a set of extensive experiments. Experimental results show that ThermAlloc can reduce the peak temperature by 26.94% with 3.15% extra worst-case response time in comparison with the baseline scheme. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
50. Efficient Algorithms for Bayesian Nearest Neighbor Gaussian Processes.
- Author
-
Finley, Andrew O., Datta, Abhirup, Cook, Bruce D., Morton, Douglas C., Andersen, Hans E., and Banerjee, Sudipto
- Subjects
- *
GAUSSIAN processes , *NUMERICAL solutions for linear algebra , *LIDAR , *RANDOM forest algorithms , *FOREST canopies , *ALGORITHMS , *COMPUTER memory management - Abstract
We consider alternate formulations of recently proposed hierarchical nearest neighbor Gaussian process (NNGP) models for improved convergence, faster computing time, and more robust and reproducible Bayesian inference. Algorithms are defined that improve CPU memory management and exploit existing high-performance numerical linear algebra libraries. Computational and inferential benefits are assessed for alternate NNGP specifications using simulated datasets and remotely sensed light detection and ranging data collected over the U.S. Forest Service Tanana Inventory Unit (TIU) in a remote portion of Interior Alaska. The resulting data product is the first statistically robust map of forest canopy for the TIU. Supplemental materials for this article are available online. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.