31 results on '"Xianzhang Chen"'
Search Results
2. SENTunnel: Fast Path for Sensor Data Access on Automotive Embedded Systems
- Author
-
Rongwei Zheng, Xianzhang Chen, Duo Liu, Junjie Feng, Jiapin Wang, Ao Ren, Chengliang Wang, and Yujuan Tan
- Subjects
Electrical and Electronic Engineering ,Computer Graphics and Computer-Aided Design ,Software - Published
- 2022
- Full Text
- View/download PDF
3. FRL: Fast and Reconfigurable Accelerator for Distributed Sound Source Localization
- Author
-
Xiaofeng Ding, Chengliang Wang, Heping Liu, Zhihai Zhang, Xianzhang Chen, Yujuan Tan, Duo Liu, and Ao Ren
- Subjects
Electrical and Electronic Engineering ,Computer Graphics and Computer-Aided Design ,Software - Published
- 2022
- Full Text
- View/download PDF
4. Horae: A Hybrid I/O Request Scheduling Technique for Near-Data Processing-Based SSD
- Author
-
Jiali Li, Xianzhang Chen, Duo Liu, Lin Li, Jiapin Wang, Zhaoyang Zeng, Yujuan Tan, and Lei Qiao
- Subjects
Electrical and Electronic Engineering ,Computer Graphics and Computer-Aided Design ,Software - Published
- 2022
- Full Text
- View/download PDF
5. eRDAC: Efficient and Reliable Remote Direct Access and Control for Embedded Systems
- Author
-
Junjie Feng, Xianzhang Chen, Duo Liu, Weigong Zhang, Jiapin Wang, Rongwei Zheng, and Yujuan Tan
- Subjects
Electrical and Electronic Engineering ,Computer Graphics and Computer-Aided Design ,Software - Published
- 2022
- Full Text
- View/download PDF
6. Self-Adapting Channel Allocation for Multiple Tenants Sharing SSD Devices
- Author
-
Duo Liu, Yujuan Tan, Renping Liu, Runyu Zhang, Xianzhang Chen, and Liang Liang
- Subjects
Channel allocation schemes ,business.industry ,Computer science ,Electrical and Electronic Engineering ,business ,Computer Graphics and Computer-Aided Design ,Software ,Computer network - Published
- 2022
- Full Text
- View/download PDF
7. Bridging Mismatched Granularity Between Embedded File Systems and Flash Memory
- Author
-
Runyu Zhang, Duo Liu, Chengliang Wang, Chaoshu Yang, Xiongxiong She, Xianzhang Chen, Zhaoyan Shen, and Yujuan Tan
- Subjects
File system ,Bridging (networking) ,Write amplification ,business.industry ,Computer science ,computer.software_genre ,Computer Graphics and Computer-Aided Design ,Flash memory ,Metadata ,Logical conjunction ,Embedded system ,Granularity ,Electrical and Electronic Engineering ,Performance improvement ,business ,computer ,Software - Abstract
The mismatch between logical and physical I/O granularity inhibits the deployment of embedded file systems. Most existing embedded file systems manage logical space with a small unit, which is no longer the case of the flash operation granularity. Manually enlarging the logical I/O granularity of file systems requires enormous transplanting efforts. Moreover, large logical pages signify the write amplification problem, which turns to severe space consumption and performance collapse. This article designs a novel storage middleware, NV-middle, for legacy-embedded file systems with large-capacity flash memories. Legacy-embedded storage schemes can be smoothly transplanted into new platforms with different hardware read/write granularity. Moreover, the legacy optimization schemes can be maximally reserved, without inducing write amplification problems. We implement NV-middle with the state-of-the-art embedded file system, YAFFS2. Comprehensive evaluations show that NV-middle can achieve times of performance improvement over manually transplanted YAFFS2 with various workloads.
- Published
- 2021
- Full Text
- View/download PDF
8. Contour: A Process Variation Aware Wear-Leveling Mechanism for Inodes of Persistent Memory File Systems
- Author
-
Wang Xinxin, Chaoshu Yang, Qingfeng Zhuge, Weiwen Jiang, Xianzhang Chen, and Edwin H.-M. Sha
- Subjects
File system ,Computer science ,Linux kernel ,02 engineering and technology ,inode ,Parallel computing ,computer.software_genre ,020202 computer hardware & architecture ,Theoretical Computer Science ,Process variation ,Memory management ,Computational Theory and Mathematics ,Hardware and Architecture ,0202 electrical engineering, electronic engineering, information engineering ,Overhead (computing) ,Table (database) ,computer ,Software ,Wear leveling - Abstract
Existing persistent memory file systems exploit the fast, byte-addressable persistent memory (PM) to boost storage performance but ignore the limited endurance of PM. Particularly, the PM storing the inode section is extremely vulnerable for the inodes are most frequently updated, fixed on a location throughout lifetime, and require immediate persistency. The huge endurance variation of persistent memory domains caused by process variation makes things even worse. In this article, we propose a process variation aware wear leveling mechanism called Contour for the inode section of persistent memory file system. Contour first enables the movement of inodes by virtualizing the inodes with a deflection table. Then, Contour adopts cross-domain migration algorithm and intra-domain migration algorithm to balance the writes across and within the memory domains. We implement the proposed Contour mechanism in Linux kernel 4.4.30 based on a real persistent memory file system, SIMFS. We use standard benchmarks, including Filebench, MySQL, and FIO, to evaluate Contour. Extensive experimental results show Contour can improve the wear ratios of pages 417.8× and 4.5× over the original SIMFS and PCV , the state-of-the-art inode wear-leveling algorithm, respectively. Meanwhile, the average performance overhead and wear overhead of Contour are 0.87 and 0.034 percent in application-level workloads, respectively.
- Published
- 2021
- Full Text
- View/download PDF
9. Optimizing synchronization mechanism for block-based file systems using persistent memory
- Author
-
Xianzhang Chen, Qingfeng Zhuge, Duo Liu, Edwin H.-M. Sha, Runyu Zhang, and Chaoshu Yang
- Subjects
File system ,Hardware_MEMORYSTRUCTURES ,Data consistency ,Computer Networks and Communications ,business.industry ,Computer science ,ext4 ,020206 networking & telecommunications ,Linux kernel ,02 engineering and technology ,Data loss ,computer.software_genre ,Synchronization ,Persistence (computer science) ,Hardware and Architecture ,Embedded system ,Synchronization (computer science) ,0202 electrical engineering, electronic engineering, information engineering ,Overhead (computing) ,020201 artificial intelligence & image processing ,business ,computer ,Software ,Block (data storage) - Abstract
Existing block-based file systems employ buffer caching mechanism to improve performance, which may result in data loss in the case of power failure or system crash. To avoid data loss, the file systems provide synchronization operations for applications to synchronously write the dirty data in DRAM cache back to the slow block devices. However, the synchronization operations can severely degrade the performance of the file system since violating the intention of buffer caching mechanism. In this paper, we propose to relieve the overhead of synchronization operations while ensuring data reliability by utilizing a small Persistent Memory. The proposed Persistent Memory assisted Write-back (PMW) mechanism includes a dedicated Copy-on-Write mechanism to guarantee data consistency and a write-back mechanism across PM and the block devices. We implement the proposed PMW in Linux kernel based on Ext4. The experimental results show that PMW can achieve about 2.2 × and 1.6 × performance improvement over the original Ext4 and AFCM, the state-of-the-art PM-based synchronization mechanism, on the TPCC workload, respectively.
- Published
- 2020
- Full Text
- View/download PDF
10. Separable Binary Convolutional Neural Network on Embedded Systems
- Author
-
Yujuan Tan, Chaoshu Yang, Liang Liang, Renping Liu, Yingjian Ling, Duo Liu, Runyu Zhang, Weilue Wang, Chunhua Xiao, and Xianzhang Chen
- Subjects
business.industry ,Computer science ,Binary number ,02 engineering and technology ,Convolutional neural network ,020202 computer hardware & architecture ,Theoretical Computer Science ,Separable space ,Computational Theory and Mathematics ,Kernel (image processing) ,Hardware and Architecture ,Embedded system ,Principal component analysis ,0202 electrical engineering, electronic engineering, information engineering ,Network performance ,business ,Software - Abstract
We have witnessed the tremendous success of deep neural networks. However, this success comes with the considerable memory and computational costs which make it difficult to deploy these networks directly on resource-constrained embedded systems. To address this problem, we propose TaijiNet, a separable binary network, to reduce the storage and computational overhead while maintaining a comparable accuracy. Furthermore, we also introduce a strategy called partial binarized convolution which binarizes only unimportant kernels to efficiently balance network performance and accuracy. Our approach is evaluated on the CIFAR-10 and ImageNet datasets. The experimental results show that with the proposed TaijiNet, the separable binary versions of AlexNet and ResNet-18 can achieve 26× and 6.4× compression rates with comparable accuracy when comparing with the full-precision versions respectively. In addition, by adjusting the PCA threshold, the xnor version of Taiji-AlexNet improves accuracy by 4-8 percent comparing with other state-of-the-art methods.
- Published
- 2020
- Full Text
- View/download PDF
11. ChordMap: Automated Mapping of Streaming Applications onto CGRA
- Author
-
Zhaoying Li, Tulika Mitra, Anuj Pathania, Dhananjaya Wijerathne, Xianzhang Chen, and Parallel Computing Systems (IvI, FNWI)
- Subjects
Computer science ,Compiler ,Parallel computing ,Electrical and Electronic Engineering ,computer.software_genre ,Computer Graphics and Computer-Aided Design ,Throughput (business) ,computer ,Software - Abstract
Streaming applications, consisting of several communicating kernels, are ubiquitous in the embedded computing systems. The synchronous data flow (SDF) is commonly used to capture the complex communication patterns among the kernels. The general-purpose processors cannot meet the throughput requirement of the compute-intensive kernels in the current and emerging applications. The coarse-grained reconfigurable arrays (CGRAs) are well-suited to accelerate the individual kernel and the compiler technology is well-developed to support the mapping of a kernel onto a CGRA accelerator. However, the system-level mapping of the entire streaming application onto a resource-constrained CGRA to maximize throughput remains unexplored. We introduce a novel CGRA mapper, called ChordMap, to automatically generate a high-quality mapping of streaming applications represented as SDF onto CGRAs. We propose an optimized spatio-temporal mapping with modulo-scheduling that judiciously employs concurrent execution of multiple kernels to improve parallelism and thereby maximize throughput. ChordMap achieves, on average, 1.74× higher throughput across eight streaming applications compared to the state-of-the-art.
- Published
- 2022
12. Downsizing Without Downgrading: Approximated Dynamic Time Warping on Nonvolatile Memories
- Author
-
Yingjian Ling, Xianzhang Chen, Duo Liu, Po-Chun Huang, Renping Liu, Yi Gu, Liang Liang, Kan Zhong, and Xingni Li
- Subjects
Dynamic time warping ,Similarity (geometry) ,Computer science ,02 engineering and technology ,computer.software_genre ,Computer Graphics and Computer-Aided Design ,020202 computer hardware & architecture ,Euclidean distance ,Upsampling ,0202 electrical engineering, electronic engineering, information engineering ,Data mining ,Electrical and Electronic Engineering ,Time series ,Wireless sensor network ,computer ,Software - Abstract
In recent years, time-series data have emerged in a variety of application domains, such as wireless sensor networks and surveillance systems. To identify the similarity between time-series data, the Euclidean distance and its variations are common metrics that quantify the differences between time-series data. However, the Euclidean distance is limited by its inability to elastically shift with the time axis, which motivates the development of dynamic time warping (DTW) algorithms. While DTW algorithms have been proven very useful in diversified applications like speech recognition, their efficacy might be seriously affected by the resolution of the time-series data. However, high-resolution time-series data might take up a gigantic amount of main memory and storage space, which will slow down the DTW analysis procedure. This makes the upscaling of DTW analysis more challenging, especially for in-memory data analytics platforms with limited nonvolatile memory space. In this paper, we propose a strategy to downsample time-series data to significantly reduce their size without seriously affecting the precision of the results obtained by DTW algorithms (downsizing without downgrading). In other words, this paper proposes a technique to remove the unimportant details that are largely ignored by DTW algorithms. The efficacy of the proposed technique is verified by a series of experimental studies, where the results are quite encouraging.
- Published
- 2020
- Full Text
- View/download PDF
13. On the Design of Time-Constrained and Buffer-Optimal Self-Timed Pipelines
- Author
-
Edwin H.-M. Sha, Lei Yang, Weiwen Jiang, Jingtong Hu, Qingfeng Zhuge, and Xianzhang Chen
- Subjects
Marked graph ,Matching (graph theory) ,Computer science ,Pipeline (computing) ,02 engineering and technology ,Parallel computing ,Computer Graphics and Computer-Aided Design ,Synchronization ,020202 computer hardware & architecture ,Reduction (complexity) ,Asynchronous communication ,0202 electrical engineering, electronic engineering, information engineering ,Electrical and Electronic Engineering ,Field-programmable gate array ,Integer programming ,Software - Abstract
Pipelining is a powerful technique to achieve high performance in computing systems. However, as computing platforms become large-scale and integrate with heterogeneous processing elements (PEs) (CPUs, GPUs, field-programmable gate arrays, etc.), it is difficult to employ a global clock to achieve synchronous pipelines. Therefore, self-timed (or asynchronous) pipelines are usually adopted. Nevertheless, due to their complex running behavior, the performance modeling and systematic optimizations for self-timed pipeline (STP) systems are more complicated than those for synchronous ones. This paper employs marked graph theory to model STPs and presents algorithms to detect performance bottlenecks. Based on the proposed model, we observe that the system performance can be improved by inserting buffers. Due to the limited memory resources on the PEs, it is critical to minimize the number of buffers for STPs while satisfying the required timing constraints. In this paper, we propose integer linear programming formulations to obtain the optimal solutions and devise efficient algorithms to obtain the near-optimal solutions. Experimental results show that the proposed algorithms can achieve 53.10% improvement in the maximum performance and 54.04% reduction in the number of buffers, compared with the technique for the slack matching problem.
- Published
- 2019
- Full Text
- View/download PDF
14. HydraFS: an efficient NUMA-aware in-memory file system
- Author
-
Kai Liu, Edwin H.-M. Sha, Xianzhang Chen, Zhixiang Liu, Ting Wu, Qingfeng Zhuge, and Chunhua Xiao
- Subjects
File system ,Hardware_MEMORYSTRUCTURES ,Computer Networks and Communications ,Computer science ,020206 networking & telecommunications ,Linux kernel ,02 engineering and technology ,Thread (computing) ,computer.software_genre ,Scalability ,0202 electrical engineering, electronic engineering, information engineering ,Operating system ,020201 artificial intelligence & image processing ,computer ,Software - Abstract
Emerging persistent file systems are designed to achieve high-performance data processing by effectively exploiting the advanced features of Non-volatile Memory (NVM). Non-uniform memory access (NUMA) architectures are universally used in high-performance computing and data centers due to its scalability. However, existing NVM-based in-memory file systems are all designed for uniformed memory access systems. Their performance is not satisfactory on NUMA machine as they do not consider the architecture of multiple nodes and the asymmetric memory access speed. In this paper, we design an efficient NUMA-aware in-memory file system which distributes file data on all nodes to effectively balance the loads of file requests. Three approaches for improving the performance of the file system on NUMA machine are proposed, including Node-oriented File Creation algorithm to dispatch files over multiple nodes, File-oriented Thread Binding algorithm to bind threads to the gainful nodes and a buffer assignment technique to allocate the user buffer from the proper node. Further, based on the new design, we implement a functional NUMA-aware in-memory file system, HydraFS, in Linux kernel. Extensive experiments show that HydraFS significantly outperforms existing representative in-memory file systems on NUMA machine. The average performance of HydraFS is 76.6%, 91.9%, 26.7% higher than EXT4-DAX, PMFS, and SIMFS, respectively.
- Published
- 2019
- Full Text
- View/download PDF
15. FitCNN: A cloud-assisted and low-cost framework for updating CNNs on IoT devices
- Author
-
Yujuan Tan, Chaoshu Yang, Liang Liang, Jinting Ren, Duo Liu, Xianzhang Chen, Moming Duan, Renping Liu, and Shiming Li
- Subjects
Artificial neural network ,Contextual image classification ,Computer Networks and Communications ,business.industry ,Computer science ,Real-time computing ,020206 networking & telecommunications ,Cloud computing ,02 engineering and technology ,Convolutional neural network ,Upload ,User experience design ,Hardware and Architecture ,0202 electrical engineering, electronic engineering, information engineering ,Overhead (computing) ,020201 artificial intelligence & image processing ,business ,Mobile device ,Software - Abstract
Recently convolutional neural networks (CNNs) have essentially achieved the state-of-the-art accuracies in image classification and recognition tasks. CNNs are usually deployed in the cloud to handle data collected from IoT devices, such as smartphones and unmanned systems. However, significant data transmission overhead and privacy issues have made it necessary to use CNNs directly in device side. Nevertheless, the trained model deployed on mobile devices cannot effectively handle the unknown data and objects in new environments, which could lead to low accuracy and poor user experience. Hence, it would be crucial to re-train a better model via future unknown data. However, with tremendous computing cost and memory usage, training a CNN on IoT devices with limited hardware resources is intolerable in practice. To solve this issue, using the power of cloud to assist mobile devices to train a deep neural network becomes a promising solution . Therefore, this paper proposes a cloud-assisted CNN framework, named FitCNN, with incremental learning and low data transmission, to reduce the overhead of updating CNNs deployed on devices. To reduce the data transmission during incremental learning, we propose a strategy, called Distiller, to selectively upload the data that is worth learning, and develop an extracting strategy, called Juicer, to choose light amount of weights from the new CNN model generated on the cloud to update the corresponding old ones on devices. Experimental results show that the Distiller strategy can reduce 39.4% data transmission of uploading based on a certain dataset, and the Juicer strategy reduces by more than 60% data transmission of updating with multiple CNNs and datasets.
- Published
- 2019
- Full Text
- View/download PDF
16. Efficient persistent memory file systems using virtual superpages with multi-level allocator
- Author
-
Chaoshu Yang, Zhiwang Yu, Runyu Zhang, Shun Nie, Hui Li, Xianzhang Chen, Linbo Long, and Duo Liu
- Subjects
Hardware and Architecture ,Software - Published
- 2022
- Full Text
- View/download PDF
17. CoDiscard: A revenue model based cross-layer cooperative discarding mechanism for flash memory devices
- Author
-
Xiaoliu Feng, Xianzhang Chen, Ruolan Li, Jiali Li, Chunlin Song, Duo Liu, Yujuan Tan, and Lei Qiao
- Subjects
Hardware and Architecture ,Software - Published
- 2022
- Full Text
- View/download PDF
18. Towards highly-concurrent leaderless state machine replication for distributed systems
- Author
-
Weilue Wang, Yujuan Tan, Changze Wu, Duo Liu, Yu Wu, Longpan Luo, and Xianzhang Chen
- Subjects
Hardware and Architecture ,Software - Published
- 2022
- Full Text
- View/download PDF
19. DWARM: A wear-aware memory management scheme for in-memory file systems
- Author
-
Lin Wu, Linfeng Cheng, Qingfeng Zhuge, Edwin H.-M. Sha, and Xianzhang Chen
- Subjects
010302 applied physics ,Scheme (programming language) ,Hardware_MEMORYSTRUCTURES ,Computer Networks and Communications ,business.industry ,Path (computing) ,Computer science ,Memory bus ,02 engineering and technology ,Data structure ,01 natural sciences ,020202 computer hardware & architecture ,Memory management ,Hardware and Architecture ,Embedded system ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Overhead (computing) ,Persistent data structure ,business ,computer ,Software ,Dram ,computer.programming_language - Abstract
Emerging non-volatile memories (NVMs) are promised to revolutionize storage systems by providing fast, persistent data accesses on memory bus. A hybrid NVM/DRAM architecture that combines faster, volatile DRAM with slightly slower, denser NVM can harness the characteristics of both technologies. In order to fully take advantage of NVM, state-of-the-art in-memory file systems are designed to provide high performance and strong consistency guarantees. However, the free space management schemes of existing in-memory file systems can easily cause “hot spots” when updating data structures on NVM, leading to significant skewness in terms of writes to each data page. In this paper, we propose dynamic wear-aware range management (DWARM) scheme, a novel free space management technique for in-memory file systems. This scheme achieves wear-leveling with high performance for allocation/deallocation. The essential idea is to allocate less-written pages for each allocation request. Specifically, this scheme works by associating a write counter with each data page and updating the counters in the file write path. We build an “index” structure to fast locate the pages that have received less writes. The index divides the NVM pages into different subranges according to the write counters. Allocation always starts from the minimal subrange. Also, we propose Adaptive Wear Range Determination Algorithm to adjust the wear ranges dynamically. To accelerate lookup, we keep the index in DRAM and avoid the overhead of strong consistency by rebuilding the index in case of system failure. Experimental results show that this scheme can provide 4.9 × to 158.1 × wear-leveling improvement compared to the state-of-the-art memory management schemes. For application workloads, the DWARM strategy can improve the lifetime of NVM by up to 125 × , 39 × , and 25 × , compared with the standard memory management schemes of PMFS, NOVA and SIMFS.
- Published
- 2018
- Full Text
- View/download PDF
20. Heterogeneous FPGA-Based Cost-Optimal Design for Timing-Constrained CNNs
- Author
-
Lei Yang, Qingfeng Zhuge, Jingtong Hu, Edwin H.-M. Sha, Weiwen Jiang, and Xianzhang Chen
- Subjects
010302 applied physics ,Optimization problem ,Speedup ,Cost efficiency ,Data parallelism ,Computer science ,Pipeline (computing) ,Task parallelism ,02 engineering and technology ,01 natural sciences ,Computer Graphics and Computer-Aided Design ,020202 computer hardware & architecture ,Dynamic programming ,Reduction (complexity) ,Memory management ,Computer engineering ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Electrical and Electronic Engineering ,Software - Abstract
Field programmable gate array (FPGA) has been one of the most popular platforms to implement convolutional neural networks (CNNs) due to its high performance and cost efficiency; however, limited by the on-chip resources, the existing single-FPGA architectures cannot fully exploit the parallelism in CNNs. In this paper, we explore heterogeneous FPGA-based designs to effectively leverage both task and data parallelism, such that the resultant system can achieve the minimum cost while satisfying timing constraints. In order to maximize the task parallelism, we investigate two critical problems: 1) buffer placement , where to place buffers to partition CNNs into pipeline stages and 2) task assignment , what type of FPGA to implement different CNN layers. We first formulate the system-level optimization problem with a mixed integer linear programming model. Then, we propose an efficient dynamic programming algorithm to obtain the optimal solutions. On top of that, we devise an efficient algorithm that exploits data parallelism within CNN layers to further improve cost efficiency. Evaluations on well-known CNNs demonstrate that the proposed techniques can obtain an average of 30.82% reduction in system cost under the same timing constraint, and an average of 1.5 times speedup in performance under the same cost budget, compared with the state-of-the-art techniques.
- Published
- 2018
- Full Text
- View/download PDF
21. UMFS: An efficient user-space file system for non-volatile memory
- Author
-
Xianzhang Chen, Ting Wu, Lin Wu, Edwin H.-M. Sha, Weiwen Jiang, Zeng Xiaoping, and Qingfeng Zhuge
- Subjects
File system ,Data consistency ,Computer science ,020206 networking & telecommunications ,02 engineering and technology ,computer.software_genre ,020202 computer hardware & architecture ,Non-volatile memory ,File size ,Kernel (image processing) ,Virtual address space ,Hardware and Architecture ,Journaling file system ,0202 electrical engineering, electronic engineering, information engineering ,Operating system ,User space ,computer ,Software - Abstract
Emerging non-volatile memory (NVM) is expected to be a mainstream storage media in embedded systems for its low-power consumption, near-DRAM speed, high density, and byte-addressability. In-memory file systems are proposed to achieve high-performance file accesses by storing files in NVM. Existing in-memory file systems, such as NOVA and EXT4-DAX, operate in kernel space and have additional overhead caused by kernel layers and mode change. In this paper, we propose a new design of User-space in-Memory File System (UMFS) to improve file access speed by minimizing the overhead of kernel. We implement UMFS in Linux system to verify the proposed design. In open operation, UMFS exposes a file into user-space in constant time independent from the file size. Then, UMFS can achieve high-performance file accesses taking advantages of user virtual address space and existing address translation hardware in processors. We also propose an efficient user-space journaling to ensure data consistency while minimizing kernel cost. Extensive experiments are conducted on standard benchmarks to compare UMFS with NOVA, EXT4-DAX, and SIMFS, the state-of-the-art in-memory file system. The experimental results show that UMFS outperforms any of existing in-memory file systems.
- Published
- 2018
- Full Text
- View/download PDF
22. Towards the Design of Efficient and Consistent Index Structure with Minimal Write Activities for Non-Volatile Memory
- Author
-
Runyu Zhang, Xianzhang Chen, Zhulin Ma, Weiwen Jiang, Edwin H.-M. Sha, Hailiang Dong, and Qingfeng Zhuge
- Subjects
010302 applied physics ,Speedup ,CPU cache ,Computer science ,Search engine indexing ,02 engineering and technology ,Linked list ,Parallel computing ,Data structure ,01 natural sciences ,020202 computer hardware & architecture ,Theoretical Computer Science ,Database index ,Tree (data structure) ,Tree structure ,Computational Theory and Mathematics ,Data retrieval ,Hardware and Architecture ,Search algorithm ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Software - Abstract
Index structures can significantly accelerate the data retrieval operations in data intensive systems, such as databases. Tree structures, such as B $^{+}$ -tree alike, are commonly employed as index structures; however, we found that the tree structure may not be appropriate for Non-Volatile Memory (NVM) in terms of the requirements for high-performance and high-endurance. This paper studies what is the best index structure for NVM-based systems and how to design such index structures. The design of an NVM-friendly index structure faces a lot of challenges. First , in order to prolong the lifetime of NVM, the write activities on NVM should be minimized. To this end, the index structure should be as simple as possible. The index proposed in this paper is based on the simplest data structure, i.e., linked list. Second , the simple structure brings challenges to achieve high-performance data retrieval operations. To overcome this challenge, we design a novel technique by explicitly building up a contiguous virtual address space on the linked list, such that efficient search algorithms can be performed. Third , we need to carefully consider data consistency issues in NVM-based systems, because the order of memory writes may be changed and the data content in NVM may be inconsistent due to write-back effects of CPU cache. This paper devises a novel indexing scheme, called “ V irtual L inear A ddressable B uckets” (VLAB). We implement VLAB in a storage engine and plug it into MySQL. Evaluations are conducted on an NVDIMM workstation using YCSB workloads and real-world traces. Results show that write activities of the state-of-the-art indexes are 6.98 times more than ours; meanwhile, VLAB achieves 2.53 times speedup.
- Published
- 2018
- Full Text
- View/download PDF
23. ELOFS: An Extensible Low-overhead Flash File System for Resource-scarce Embedded Devices
- Author
-
Runyu Zhang, Duo Liu, Xianzhang Chen, Xiongxiong She, Chaoshu Yang, Yujuan Tan, Zhaoyan Shen, Zili Shao, and Lei Qiao
- Subjects
Computational Theory and Mathematics ,Hardware and Architecture ,Software ,Theoretical Computer Science - Published
- 2022
- Full Text
- View/download PDF
24. A machine learning assisted data placement mechanism for hybrid storage systems
- Author
-
Duo Liu, Jinting Ren, Xianzhang Chen, Moming Duan, Liang Liang, Yujuan Tan, and Ruolan Li
- Subjects
Hybrid storage system ,business.industry ,Computer science ,Machine learning ,computer.software_genre ,Mechanism (engineering) ,File size ,Data access ,Hardware and Architecture ,Data_FILES ,Key (cryptography) ,Hybrid storage ,Artificial intelligence ,business ,computer ,Software ,Data placement - Abstract
Emerging applications produce massive files that show different properties in file size, lifetime, and read/write frequency. Existing hybrid storage systems place these files onto different storage mediums assuming that the access patterns of files are fixed. However, we find that the access patterns of files are changeable during their lifetime. The key to improve the file access performance is to adaptively place the files on the hybrid storage system using the run-time status and the properties of both files and the storage systems. In this paper, we propose a machine learning assisted data placement mechanism that adaptively places files onto the proper storage medium by predicting access patterns of files. We design a PMFS based tracer to collect file access features for prediction and show how this approach is adaptive to the changeable access pattern. Based on data access prediction results, we present a linear data placement algorithm to optimize the data access performance on the hybrid storage mediums. Extensive experimental results show that the proposed learning algorithm can achieve over 90% accuracy for predicting file access patterns. Meanwhile, this paper can achieve over 17% improvement of system performance for file accesses compared with the state-of-the-art linear-time data placement methods.
- Published
- 2021
- Full Text
- View/download PDF
25. MobileRE: A replicas prioritized hybrid fault tolerance strategy for mobile distributed system
- Author
-
Yujuan Tan, Jinting Ren, Yu Wu, Duo Liu, Ziling Zhang, Renping Liu, and Xianzhang Chen
- Subjects
Dynamic network analysis ,Hardware and Architecture ,Computer science ,Replica ,Distributed computing ,Failure probability ,Bandwidth (computing) ,Data reliability ,Fault tolerance ,Erasure code ,Software ,Reliability (statistics) - Abstract
Fault tolerance techniques are of vital importance to promise data reliability for mobile distributed system. In mobile environments, nodes suffer from high failure probability and fluctuating bandwidth. Thus, traditional fault tolerance techniques are no longer suitable. In this paper, we present a replica prioritized hybrid fault tolerance strategy combining erasure codes and replicas for a mobile distributed system (MobileRE), to guarantee data reliability with dynamic network status. In MobileRE, we first formulate a reliability cost rate to indicate the cost of ensuring data reliability of the mobile system. MobileRE further adaptively applies the erasure codes and replicas algorithms based on real-time network bandwidth status to minimize system reliability cost rate. MobileRE also obtains the optimal reliability cost rate by customizing redundant configuration parameters. The numerical and simulation results verify the effectiveness of the proposed schemes, and show that compared with traditional designs that only adopt erasure codes or replicas, MobileRE can significantly reduce the system reliability cost rate.
- Published
- 2021
- Full Text
- View/download PDF
26. Refinery swap: An efficient swap mechanism for hybrid DRAM–NVM systems
- Author
-
Qingfeng Zhuge, Ting Wu, Weiwen Jiang, Xianzhang Chen, Edwin H.-M. Sha, and Chaoshu Yang
- Subjects
Hardware_MEMORYSTRUCTURES ,Computer Networks and Communications ,Computer science ,020206 networking & telecommunications ,02 engineering and technology ,computer.software_genre ,Refinery ,020202 computer hardware & architecture ,Non-volatile memory ,Hardware and Architecture ,0202 electrical engineering, electronic engineering, information engineering ,Operating system ,computer ,Swap (computer programming) ,Software ,Dram - Abstract
Emerging Non-Volatile Memory (NVM) technologies have shown great promise for enabling high performance swapping mechanism in embedded systems. Most of existing swap mechanisms have limited performance for lacking the knowledge of memory accesses and cause large overhead of swap operations by entirely avoiding direct writes to NVM. This paper, we find out the feature of “write count disparity”, i.e., most pages are rarely written and most writes are concentrated on a few pages. With the observations in mind, this paper rethinks and re-designs the swap mechanism to reduce the number of swap operations and writes to NVM in hybrid DRAM–NVM systems by tolerating small writes to NVM. A new swap mechanism, Refinery Swap, is proposed with a ( 1 + e ) -competitive algorithm for swap-in operations, a multilevel priority algorithm for selecting the victim pages of swap-out operations, and a swap-based wear-leveling algorithm for NVM. Extensive experiments are conducted with standard benchmarks. Compared with Dr.Swap, the state-of-the-art swap mechanism, Refinery Swap reduces more than 90% of swap operations and writes to NVM. Refiner Swap achieves encouraging improvements over existing swap mechanisms in the aspects of performance, energy consumption, and the lifetime of NVM.
- Published
- 2017
- Full Text
- View/download PDF
27. Optimal functional unit assignment and voltage selection for pipelined MPSoC with guaranteed probability on time performance
- Author
-
Edwin H.-M. Sha, Weiwen Jiang, Hailiang Dong, Xianzhang Chen, and Qingfeng Zhuge
- Subjects
020203 distributed computing ,Computer science ,Multiprocessing ,Parallel computing ,02 engineering and technology ,MPSoC ,Computer Graphics and Computer-Aided Design ,020202 computer hardware & architecture ,Set (abstract data type) ,0202 electrical engineering, electronic engineering, information engineering ,On-time performance ,Throughput (business) ,Software ,Efficient energy use - Abstract
Pipelined heterogeneous multiprocessor system-on-chip (MPSoC) can provide high throughput for streaming applications. In the design of such systems, time performance and system cost are the most concerning issues. By analyzing runtime behaviors of benchmarks in real-world platforms, we find that execution times of tasks are not fixed but spread with probabilities. In terms of this feature, we model execution times of tasks as random variables. In this paper, we study how to design high-performance and low-cost MPSoC systems to execute a set of such tasks with data dependencies in a pipelined fashion. Our objective is to obtain the optimal functional unit assignment and voltage selection for the pipelined MPSoC systems, such that the system cost is minimized while timing constraints can be met with a given guaranteed probability. For each required probability, our proposed algorithm can efficiently obtain the optimal solution. Experiments show that other existing algorithms cannot find feasible solutions in most cases, but ours can. Even for those solutions that other algorithms can obtain, ours can reach 30% reductions in total cost compared with others.
- Published
- 2017
- Full Text
- View/download PDF
28. Efficient Data Placement for Improving Data Access Performance on Domain-Wall Memory
- Author
-
Edwin H.-M. Sha, Qingfeng Zhuge, Chun Jason Xue, Xianzhang Chen, Weiwen Jiang, and Wang Yuangang
- Subjects
010302 applied physics ,Computer science ,Locality ,02 engineering and technology ,Parallel computing ,01 natural sciences ,020202 computer hardware & architecture ,Data access ,Hardware and Architecture ,0103 physical sciences ,Hardware_INTEGRATEDCIRCUITS ,0202 electrical engineering, electronic engineering, information engineering ,Algorithm design ,Electrical and Electronic Engineering ,Integer programming ,Software - Abstract
A domain-wall memory (DWM) is becoming an attractive candidate to replace the traditional memories for its high density, low-power leakage, and low access latency. Accessing data on DWM is accomplished by shift operations that move data located on nanowires to read/write ports. Due to this kind of construction, data accesses on DWM exhibit varying access latencies. Therefore, data placement (DP) strategy has a significant impact on the performance of data accesses on DWM. In this paper, we prove the nondeterministic polynomial time (NP)-completeness of the DP problem on DWM. For the DWMs organized in single DWM block cluster (DBC), we present integer linear programming formulations to solve the problem optimally. We also propose an efficient single DBC placement (S-DBC-P) algorithm to exploit the benefits of multiple read/write ports and data locality. Compared with the sequential DP strategy, S-DBC-P reduces 76.9% shift operations on average for eight-port DWMs. Furthermore, for DP problem on the DWMs organized in multiple DBCs, we develop an efficient multiple DBC placement (M-DBC-P) algorithm to utilize the parallelism of DBCs. The experimental results show that the M-DBC-P achieves 90% performance improvement over the sequential DP strategy.
- Published
- 2016
- Full Text
- View/download PDF
29. A New Design of In-Memory File System Based on File Virtual Address Framework
- Author
-
Xianzhang Chen, Edwin H.-M. Sha, Liang Shi, Weiwen Jiang, and Qingfeng Zhuge
- Subjects
Computer science ,Stub file ,02 engineering and technology ,computer.software_genre ,Theoretical Computer Science ,Persistence (computer science) ,Design rule for Camera File system ,Data_FILES ,0202 electrical engineering, electronic engineering, information engineering ,Versioning file system ,SSH File Transfer Protocol ,File system fragmentation ,Flash file system ,File system ,Random access memory ,Address space ,Computer file ,Device file ,020206 networking & telecommunications ,computer.file_format ,Everything is a file ,Unix file types ,Virtual file system ,020202 computer hardware & architecture ,Torrent file ,Memory-mapped file ,File Control Block ,Self-certifying File System ,Computational Theory and Mathematics ,Virtual address space ,Hardware and Architecture ,Journaling file system ,Operating system ,computer ,Software - Abstract
The emerging technologies of persistent memory, such as PCM, MRAM, provide opportunities for preserving files in memory. Traditional file system structures may need to be re-studied. Even though there are several file systems proposed for memory, most of them have limited performance without fully utilizing the hardware at the processor side. This paper presents a framework based on a new concept, “File Virtual Address Space”. A file system, Sustainable In-Memory File System (SIMFS), is designed and implemented, which fully utilizes the memory mapping hardware at the file access path. First, SIMFS embeds the address space of an open file into the process’ address space. Then, file accesses are handled by the memory mapping hardware. Several optimization approaches are also presented for the proposed SIMFS. Extensive experiments are conducted. The experimental results show that the throughput of SIMFS achieves significant performance improvement over the state-of-the-art in-memory file systems.
- Published
- 2016
- Full Text
- View/download PDF
30. A unified framework for designing high performance in-memory and hybrid memory file systems
- Author
-
Weiwen Jiang, Jun Xu, Edwin H.-M. Sha, Junxi Chen, Jun Chen, Xianzhang Chen, and Qingfeng Zhuge
- Subjects
File system ,Computer science ,Computer file ,Stub file ,0211 other engineering and technologies ,02 engineering and technology ,computer.software_genre ,Unix file types ,020202 computer hardware & architecture ,Memory-mapped file ,File Control Block ,Hardware and Architecture ,Data_FILES ,0202 electrical engineering, electronic engineering, information engineering ,Operating system ,Versioning file system ,computer ,Software ,Flash file system ,021106 design practice & management - Abstract
The emerging non-volatile memory technologies provide a new choice for storing persistent data in memory. Therefore, file system structure needs re-studying and re-designing. Our goal is to design a framework that gives high-performance in-memory file accesses and allows a file whose data can be stored across memory and block device. This paper presents a novel unified framework for in-memory and hybrid memory file systems based on the concept that each file has a contiguous "File Virtual Address Space". Within this framework, the file access for in-memory data can be efficiently handled by address translation hardware and the virtual address space of file. The file accesses for data in block devices are handled by a dedicated page fault handler for file system. A file system called Hybrid Memory File System (HMFS) is implemented based on this framework. Experimental results show that the throughput of HMFS approaches the memory bus bandwidth in best cases. Compared with in-memory file systems, HMFS reaches 5 times, 2.1 times, and 1.6 times faster than EXT4 on Ramdisk, RAMFS, and PMFS, respectively. Compared with EXT4 on SSD and EXT4 using page cache, HMFS also achieves 100 times and tens of times performance improvement, respectively.
- Published
- 2016
- Full Text
- View/download PDF
31. Effective file data-block placement for different types of page cache on hybrid main memory architectures
- Author
-
Edwin H.-M. Sha, Weiwen Jiang, Xianzhang Chen, Penglin Dai, and Qingfeng Zhuge
- Subjects
Page fault ,Computer science ,Cache coloring ,Cache-only memory architecture ,Parallel computing ,Cache pollution ,computer.software_genre ,Memory-mapped file ,File Control Block ,Hardware and Architecture ,Operating system ,Page cache ,computer ,Software ,Flash file system - Abstract
Hybrid main memory architectures employing both DRAM and non-volatile memories (NVMs) are becoming increasingly attractive due to the opportunities for exploring benefits of various memory technologies, for example, high speed writes on DRAM and low stand-by power consumption on NVMs. File data-block placement (FDP) on different types of page cache is one of the important problems that directly impact the performance and cost of file operations on a hybrid main memory architecture. Page cache is widely used in modern operating systems to expedite file I/O by mapping disk-backed file data-blocks in main memory to process space in virtual memory. In a hybrid main memory, different types of memory with different read/write costs can be allocated as page cache by operating system. In this paper, we study the problem of file data-block placement on different types of page cache to minimize the total cost of file accesses in a program. We propose a dynamic programming algorithm, the FDP Algorithm, to solve the problem optimally for simple programs. We develop an ILP model for the file data-block placement problem for programs composed of multiple regions with data dependencies. An efficient heuristic, the global file data-block placement (GFDP) Algorithm, is proposed to obtain near-optimal solutions for the problem of global file data-block placement on hybrid main memory. Experiments on a set of benchmarks show the effectiveness of the GFDP algorithm compared with a greedy strategy and the ILP. Experimental results show that the GFDP algorithm reduces the total cost of file accesses by $$51.3~\%$$51.3% on average compared with the the greedy strategy.
- Published
- 2013
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.