74 results on '"Asynchronous I/O"'
Search Results
2. An Asynchronous Parallel I/O Framework for Mass Conservation Ocean Model.
- Author
-
Pang, Renbo, Yu, Fujiang, Zhang, Yu, and Yuan, Ye
- Subjects
CONSERVATION of mass ,GENERAL circulation model ,OCEAN circulation ,SPATIAL resolution ,OCEAN - Abstract
I/O is often a performance bottleneck in global ocean circulation models with fine spatial resolution. In this paper, we present an asynchronous parallel I/O framework and demonstrate its efficacy in the Mass Conservation Ocean Model (MaCOM) as a case study. By largely reducing I/O operations in computing processes and overlapping output in I/O processes with computation in computing processes, this framework significantly improves the performance of the MaCOM. Through both reordering output data for maintaining data continuity and combining file access for reducing file operations, the I/O optimizing algorithms are provided to improve output bandwidth. In the case study of the MaCOM, the cost of output in I/O processes can be overlapped by up to 99% with computation in computing processes as decreasing output frequency. The 1D data output bandwidth with these I/O optimizing algorithms is 3.1 times faster than before optimization at 16 I/O worker processes. Compared to the synchronous parallel I/O framework, the overall performance of MaCOM is improved by 38.8% at 1024 computing processes for a 7-day global ocean forecast with 1 output every 2 h through the asynchronous parallel I/O framework presented in this paper. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
3. A Lightweight Asynchronous I/O System for Non-volatile Memory
- Author
-
Luo, Jiebin, Zhang, Weijie, Li, Dingding, Luo, Haoyu, Zeng, Deze, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Lai, Yongxuan, editor, Wang, Tian, editor, Jiang, Min, editor, Xu, Guangquan, editor, Liang, Wei, editor, and Castiglione, Aniello, editor
- Published
- 2022
- Full Text
- View/download PDF
4. An Asynchronous Parallel I/O Framework for Mass Conservation Ocean Model
- Author
-
Renbo Pang, Fujiang Yu, Yu Zhang, and Ye Yuan
- Subjects
parallel I/O ,asynchronous I/O ,overlapping output ,combining file access ,NetCDF ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Biology (General) ,QH301-705.5 ,Physics ,QC1-999 ,Chemistry ,QD1-999 - Abstract
I/O is often a performance bottleneck in global ocean circulation models with fine spatial resolution. In this paper, we present an asynchronous parallel I/O framework and demonstrate its efficacy in the Mass Conservation Ocean Model (MaCOM) as a case study. By largely reducing I/O operations in computing processes and overlapping output in I/O processes with computation in computing processes, this framework significantly improves the performance of the MaCOM. Through both reordering output data for maintaining data continuity and combining file access for reducing file operations, the I/O optimizing algorithms are provided to improve output bandwidth. In the case study of the MaCOM, the cost of output in I/O processes can be overlapped by up to 99% with computation in computing processes as decreasing output frequency. The 1D data output bandwidth with these I/O optimizing algorithms is 3.1 times faster than before optimization at 16 I/O worker processes. Compared to the synchronous parallel I/O framework, the overall performance of MaCOM is improved by 38.8% at 1024 computing processes for a 7-day global ocean forecast with 1 output every 2 h through the asynchronous parallel I/O framework presented in this paper.
- Published
- 2023
- Full Text
- View/download PDF
5. Transparent Asynchronous Parallel I/O Using Background Threads.
- Author
-
Tang, Houjun, Koziol, Quincey, Ravi, John, and Byna, Suren
- Subjects
- *
INFORMATION retrieval , *OPTICAL disks , *DATA warehousing , *APPLICATION stores , *DATA analysis - Abstract
Moving toward exascale computing, the size of data stored and accessed by applications is ever increasing. However, traditional disk-based storage has not seen improvements that keep up with the explosion of data volume or the speed of processors. Multiple levels of non-volatile storage devices are being added to handle bursty I/O, however, moving data across the storage hierarchy can take longer than the data generation or analysis. Asynchronous I/O can reduce the impact of I/O latency as it allows applications to schedule I/O early and to check their status later. I/O is thus overlapped with application communication or computation or both, effectively hiding some or all of the I/O latency. POSIX and MPI-I/O provide asynchronous read and write operations, but lack the support for non-data operations such as file open and close. Users also have to manually manage data dependencies and use low-level byte offsets, which requires significant effort and expertise to adopt. In this article, we present an asynchronous I/O framework that supports all types of I/O operations, manages data dependencies transparently and automatically, provides implicit and explicit modes for application flexibility, and error information retrieval. We implemented these techniques in HDF5. Our evaluation of several benchmarks and application workloads demonstrates it effectiveness on hiding the I/O cost from the application. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
6. Boosting Compaction in B-Tree Based Key-Value Store by Exploiting Parallel Reads in Flash SSDs
- Author
-
Jongbaeg Lee, Gihwan Oh, and Sang-Won Lee
- Subjects
Asynchronous I/O ,compaction ,flash memory SSD ,ForestDB ,io_uring ,libaio ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Append-only B-tree based key-value stores provide superior search and update performance based on their structural characteristics; however, they periodically require the compaction task that incurs significant I/O overhead. In this paper, we present that the compaction’s degraded read performance deteriorates the overall performance in ForestDB, a representative append-only B-tree engine. We demonstrate that despite the exceptional performance of the SSD, the cause of the slow read performance is the underutilization of the SSD’s internal parallelism due to the read operations using synchronous I/O. Furthermore, this paper proposes a novel compaction method that improves the compaction’s read performance by exploiting SSD’s internal parallelism by requesting multiple read operations in a batch using the asynchronous I/O technique. We implemented our proposed methods on ForestDB using two Linux asynchronous I/O interfaces, AIO and io_uring. The evaluation results confirm that our method drastically improves the compaction’s read performance up to ten times compared to the conventional compaction method. In particular, we confirmed that the proposed method using io_uring, the latest asynchronous I/O interface, is effective regardless of the file I/O mode and outperforms the others in all cases.
- Published
- 2021
- Full Text
- View/download PDF
7. PM-AIO: An Effective Asynchronous I/O System for Persistent Memory
- Author
-
Yong Tang, Kaoru Ota, Niyang Zhang, Hao Chen, Dingding Li, and Mianxiong Dong
- Subjects
Path (computing) ,Computer science ,Concurrency ,sync ,IOPS ,Parallel computing ,Computer Science Applications ,Human-Computer Interaction ,Asynchronous communication ,Computer Science (miscellaneous) ,Overhead (computing) ,Asynchronous I/O ,Dram ,Information Systems - Abstract
Due to the expected near DRAM performance, popular async I/O systems are disabled on local persistent memory (PM) file systems, which instead use the pseudo-async I/O path (namely the sync one), to serve async I/O (AIO) requests of applications. This paper first identifies the performance shortcomings of this kind of I/O method and argues the necessity of applying the real async I/O on PM device. Then, this paper proposes PM-AIO, a general method to create an effective async I/O path on PM file systems. PM-AIO leverages kernel-level threads to achieve real asynchronism and concurrency. We implement PM-AIO in the Native-AIO of PMFS and NOVA respectively. Extensive experiments are conducted to verify the advantages of PM-AIO on a real PM platform. Compared with the original I/O methods of PM file systems, the results show that PM-AIO can reduce the latencies of AIO requests up to 3 orders of magnitude while reaping up to 2.11 IOPS in realistic workloads which contain relatively large I/O operations (often > 4 KB). Meanwhile, PM-AIO incurs up to 4% overhead when handling small I/O operations (often 4 KB) because of the inherent overhead.
- Published
- 2022
8. AUTOMATED STORAGE MANAGEMENT.
- Author
-
Falak, Ujwala and Shinde, Shubhangi S.
- Subjects
COMPUTER storage devices ,AUTOMATION ,ORACLE software ,DATABASE management - Abstract
The Oracle Database (Commonly referred to as Oracle RDBMS or simply as Oracle) is an object-relational database management System (ORDBMS) produced and marketed by Oracle Corporation. In recent IT terminologies, there has been gaining more importance for Shared Storage. It is the main and critical requirement for building and managing the Real Application Cluster (RAC) recommended by Oracle. Using the ASM based shared storage for RAC database is a recommended method by Oracle. One of the major motivations behind use of ASM is that it removes the management hassles of raw devices and provides the performance advantage of the raw devices; also it offers better management and tuning of IO activity and eliminates the need for any volume management. In this paper, case study has contained the benefit in relations with storage management which include Direct I/O, Asynchronous I/O, Striping, Mirroring; Load Balancing. This study provides prominent area in shared storage facilitating ease of administration and storage reliability. Paper has provided important guidelines for management of Oracle Real Application Cluster (RAC). [ABSTRACT FROM AUTHOR]
- Published
- 2018
9. NV-eCryptfs: Accelerating Enterprise-Level Cryptographic File System with Non-Volatile Memory.
- Author
-
Xiao, Chunhua, Zhang, Lei, Liu, Weichen, Cheng, Linfeng, Li, Pengda, Pan, Yanyue, and Bergmann, Neil
- Subjects
- *
PARALLEL programming , *RANSOMWARE , *MEMORY - Abstract
The development of cloud computing and big data results in a large amount of data transmitting and storing. In order to protect sensitive data from leakage and unauthorized access, many cryptographic file systems are proposed to transparently encrypt file contents before storing them on storage devices, such as eCryptfs. However, the time-consuming encryption operations cause serious performance degradation. We found that compared with non-crypto file system EXT4, the performance slowdown could be up to 58.53 and 86.89 percent respectively for read and write with eCryptfs. Although prior work has proposed techniques to improve the efficiency of cryptographic file system through computation acceleration, no solution focused on the inefficiency working flow, which is demonstrated to be a major factor affecting system performance. To address this open problem, we present NV-eCryptfs, an asynchronous software stack for eCryptfs, which utilizes NVM as a fast storage tier on top of slower block devices to fully parallelize encryption and data I/O. We design an efficient NVM management scheme to support the fast parallel cryptographic operations. Besides providing an address space that can be directly accessed by the hardware accelerators, our designed mechanism is able to record the memory allocation states, and supplies a backup plan to deal with the situation of NVM shortage. The additional index structure is built to accelerate lookup operations to determine if a given data block resides in NVM. Moreover, we integrate an adaptive scheduling in NV-eCryptfs to process I/O requests dynamically according to access pattern and request size, which is able to take full utilization of both software and hardware acceleration to boost crypto performance. Our evaluation shows the proposed NV-eCryptfs outperforms the original eCryptfs with software routine 23.41× and 5.82× respectively for read and write. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
10. Transparent Asynchronous Parallel I/O Using Background Threads
- Author
-
Houjun Tang, John Ravi, Suren Byna, and Quincey Koziol
- Subjects
Monitoring ,parallel I ,Computer science ,Test data generation ,Libraries ,computer.software_genre ,Computer Software ,Instruction set ,Instruction sets ,Asynchronous I ,Middleware ,background threads ,Communications Technologies ,Volume (computing) ,Byte ,Computational modeling ,Parallel I/O ,Computational Theory and Mathematics ,Hardware and Architecture ,Asynchronous communication ,POSIX ,Task analysis ,Signal Processing ,Operating system ,Asynchronous I/O ,Distributed Computing ,Connectors ,computer - Abstract
Moving toward exascale computing, the size of data stored and accessed by applications is ever increasing. However, traditional disk-based storage has not seen improvements that keep up with the explosion of data volume or the speed of processors. Multiple levels of non-volatile storage devices are being added to handle bursty I/O, however, moving data across the storage hierarchy can take longer than the data generation or analysis. Asynchronous I/O can reduce the impact of I/O latency as it allows applications to schedule I/O early and to check their status later. I/O is thus overlapped with application communication or computation or both, effectively hiding some or all of the I/O latency. POSIX and MPI-I/O provide asynchronous read and write operations, but lack the support for non-data operations such as file open and close. Users also have to manually manage data dependencies and use low-level byte offsets, which requires significant effort and expertise to adopt. In this article, we present an asynchronous I/O framework that supports all types of I/O operations, manages data dependencies transparently and automatically, provides implicit and explicit modes for application flexibility, and error information retrieval. We implemented these techniques in HDF5. Our evaluation of several benchmarks and application workloads demonstrates it effectiveness on hiding the I/O cost from the application.
- Published
- 2022
11. Building of online evaluation system based on socket protocol
- Author
-
Haijian Chen, Hai Sun, Peng Jiang, and Kexin Yan
- Subjects
General Computer Science ,business.industry ,Process (engineering) ,Computer science ,media_common.quotation_subject ,Port (computer networking) ,Data flow diagram ,Scalability ,Asynchronous I/O ,Software engineering ,business ,Function (engineering) ,Implementation ,Protocol (object-oriented programming) ,media_common - Abstract
As an important part of the evaluation reform, online evaluation system can effectively improve the efficiency of evaluation work, which has been paid attention by teaching institutions. The online evaluation system needs to support the safe and stable transmission of information between the client and the server, and socket protocol establishes the connection through the listening port, which can easily carry out the message transmission and process control. Because it can well meet the construction requirements of online evaluation system, it is applied in our study. The building of online evaluation system based on socket protocol includes the function design of students and teachers, data flow design, evaluation difficulty grading design and system implementation. The system uses Java language and MVC mode for development, which has good scalability and platform-independence. It realizes the paperless examination process and greatly reduces the workload of teachers. The contribution of this paper is mainly reflected in two aspects. One is to explore the construction of an online evaluation system based on the socket protocol, and it provide an Asynchronous IO technical solution for the network communication between the student and the server, which provides a reference for the development of similar systems. The second is to give the realization method of the difficulty classification of the evaluation, and classify the difficulty of the test questions, which lays the foundation for carrying out personalized testing and evaluation.
- Published
- 2022
12. Demystifying asynchronous I/O Interference in HPC applications
- Author
-
Aparna Chandramowlishwaran, Bogdan Nicolae, Franck Cappello, and Shu Mei Tseng
- Subjects
020203 distributed computing ,Computer science ,business.industry ,Distributed computing ,Data management ,020206 networking & telecommunications ,02 engineering and technology ,Theoretical Computer Science ,Workflow ,Interference (communication) ,Hardware and Architecture ,0202 electrical engineering, electronic engineering, information engineering ,Asynchronous I/O ,business ,Software - Abstract
With increasing complexity of HPC workflows, data management services need to perform expensive I/O operations asynchronously in the background, aiming to overlap the I/O with the application runtime. However, this may cause interference due to competition for resources: CPU, memory/network bandwidth. The advent of multi-core architectures has exacerbated this problem, as many I/O operations are issued concurrently, thereby competing not only with the application but also among themselves. Furthermore, the interference patterns can dynamically change as a response to variations in application behavior and I/O subsystems (e.g. multiple users sharing a parallel file system). Without a thorough understanding, I/O operations may perform suboptimally, potentially even worse than in the blocking case. To fill this gap, this paper investigates the causes and consequences of interference due to asynchronous I/O on HPC systems. Specifically, we focus on multi-core CPUs and memory bandwidth, isolating the interference due to each resource. Then, we perform an in-depth study to explain the interplay and contention in a variety of resource sharing scenarios such as varying priority and number of background I/O threads and different I/O strategies: sendfile, read/write, mmap/write underlining trade-offs. The insights from this study are important both to enable guided optimizations of existing background I/O, as well as to open new opportunities to design advanced asynchronous I/O strategies.
- Published
- 2021
13. Boosting Compaction in B-Tree Based Key-Value Store by Exploiting Parallel Reads in Flash SSDs
- Author
-
Gihwan Oh, Jongbaeg Lee, and Sang-Won Lee
- Subjects
ForestDB ,Boosting (machine learning) ,General Computer Science ,Computer science ,Interface (computing) ,General Engineering ,020207 software engineering ,02 engineering and technology ,Parallel computing ,Associative array ,020202 computer hardware & architecture ,TK1-9971 ,B-tree ,Task (computing) ,Parallel processing (DSP implementation) ,libaio ,Asynchronous communication ,0202 electrical engineering, electronic engineering, information engineering ,Asynchronous I/O ,io_uring ,Overhead (computing) ,flash memory SSD ,General Materials Science ,compaction ,Electrical engineering. Electronics. Nuclear engineering - Abstract
Append-only B-tree based key-value stores provide superior search and update performance based on their structural characteristics; however, they periodically require the compaction task that incurs significant I/O overhead. In this paper, we present that the compaction’s degraded read performance deteriorates the overall performance in ForestDB, a representative append-only B-tree engine. We demonstrate that despite the exceptional performance of the SSD, the cause of the slow read performance is the underutilization of the SSD’s internal parallelism due to the read operations using synchronous I/O. Furthermore, this paper proposes a novel compaction method that improves the compaction’s read performance by exploiting SSD’s internal parallelism by requesting multiple read operations in a batch using the asynchronous I/O technique. We implemented our proposed methods on ForestDB using two Linux asynchronous I/O interfaces, AIO and io_uring. The evaluation results confirm that our method drastically improves the compaction’s read performance up to ten times compared to the conventional compaction method. In particular, we confirmed that the proposed method using io_uring, the latest asynchronous I/O interface, is effective regardless of the file I/O mode and outperforms the others in all cases.
- Published
- 2021
14. Bridging Storage Semantics Using Data Labels and Asynchronous I/O
- Author
-
Hariharan Devarajan, Xian-He Sun, and Anthony Kougkas
- Subjects
020203 distributed computing ,Computer science ,business.industry ,Distributed computing ,Big data ,02 engineering and technology ,External Data Representation ,Software ,Hardware and Architecture ,Analytics ,Asynchronous communication ,020204 information systems ,Computer data storage ,0202 electrical engineering, electronic engineering, information engineering ,Asynchronous I/O ,business ,Software-defined storage - Abstract
In the era of data-intensive computing, large-scale applications, in both scientific and the BigData communities, demonstrate unique I/O requirements leading to a proliferation of different storage devices and software stacks, many of which have conflicting requirements. Further, new hardware technologies and system designs create a hierarchical composition that may be ideal for computational storage operations. In this article, we investigate how to support a wide variety of conflicting I/O workloads under a single storage system. We introduce the idea of a Label , a new data representation, and, we present LABIOS: a new, distributed, Label- based I/O system. LABIOS boosts I/O performance by up to 17× via asynchronous I/O, supports heterogeneous storage resources, offers storage elasticity, and promotes in situ analytics and software defined storage support via data provisioning. LABIOS demonstrates the effectiveness of storage bridging to support the convergence of HPC and BigData workloads on a single platform.
- Published
- 2020
15. Performance Evaluation of Server-side JavaScript for Healthcare Hub Server in Remote Healthcare Monitoring System.
- Author
-
Nkenyereye, Lionel and Jang, Jong-Wook
- Subjects
CLIENT/SERVER computing ,JAVASCRIPT programming language ,MEDICAL informatics ,WEARABLE technology ,INFORMATION technology - Abstract
With the help of a small wearable device, patients reside in an isolated village need constant monitoring which may increase access to care and decrease healthcare delivery cost. As the number of patients’ requests increases in simultaneously manner, the healthcare hub server located in the village hall encounters limitations for performing them successfully and concurrently. In this paper, we propose the design tasks of the Remote Healthcare Monitoring application for handling concurrency tasks. In the procedure of designing tasks, concurrency is best understood by employing multiple levels of abstraction. The way that is eminently to accomplish concurrency is to build an object-oriented environment with support for messages passing between concurrent objects. Node.js, a cross-platform runtime environment features technologies for handling concurrency issue efficiently for Remote Healthcare Monitoring System. The experiments results show that server-side JavaScript with Node.js and MongoDB as database is 40% faster than Apache Sling. With Node.js developers can build a high-performance, asynchronous, event-driven healthcare hub server to handle an increasing number of concurrent connections for Remote Healthcare Monitoring System in an isolated village with no access to local medical care. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
16. MLBS: Transparent Data Caching in Hierarchical Storage for Out-of-Core HPC Applications
- Author
-
Rached Abdelkhalak, Tariq Alturkestani, Hatem Ltaief, David E. Keyes, V. Etienne, and Thierry Tonellot
- Subjects
Computer science ,NVM Express ,Lustre (file system) ,Out-of-core algorithm ,Asynchronous I/O ,Parallel computing ,Context switch ,Bottleneck - Abstract
Out-of-core simulation systems produce and/or consume a massive amount of data that cannot fit on a single compute node memory and that usually needs to be read and/or written back and forth during computation. I/O data movement may thus represent a bottleneck in large-scale simulations. To increase I/O bandwidth, high-end supercomputers are equipped with hierarchical storage subsystems such as node-local and remote-shared NVMe and SSD-based Burst Buffers. Advanced caching systems have recently been developed to efficiently utilize the multi-layered nature of the new storage hierarchy. Utilization of software components results in more efficient data accesses, at the cost of reduced computation kernel performance and limited numbers of simultaneous applications that can utilize the additional storage layers. We introduce MultiLayered Buffer Storage (MLBS), a data object container that provides novel methods for caching and prefetching data in out-of-core scientific applications to perform asynchronously expensive I/O operations on systems equipped with hierarchical storage. The main idea consists in decoupling I/O operations from computational phases using dedicated hardware resources to perform expensive context switches. MLBS monitors I/O traffic in each storage layer allowing fair utilization of shared resources while controlling the impact on kernels' performance. By continually prefetching up and down across all hardware layers of the memory/storage subsystems, MLBS transforms the original I/O-bound behavior of evaluated applications and shifts it closer to a memory-bound regime. Our evaluation on a Cray XC40 system for a representative I/O-bound application, seismic inversion, shows that MLBS outperforms state-of-the-art filesystems, i.e., Lustre, Data Elevator and DataWarp by 6.06X, 2.23X, and 1.90X, respectively.
- Published
- 2019
17. A New Disk I/O Model of Virtualized Cloud Environment.
- Author
-
Li, Dingding, Liao, Xiaofei, Jin, Hai, Zhou, Bingbing, and Zhang, Qi
- Subjects
- *
COMPUTER input-output equipment , *CLOUD computing , *VIRTUAL machine systems , *SYSTEM failures , *HARD disks , *ELECTRONIC file management , *MATHEMATICAL models - Abstract
In a traditional virtualized cloud environment, using asynchronous I/O in the guest file system and synchronous I/O in the host file system to handle an asynchronous user disk write exhibits several drawbacks, such as performance disturbance among different guests and consistency maintenance across guest failures. To improve these issues, this paper introduces a novel disk I/O model for virtualized cloud system called HypeGear, where the guest file system uses synchronous operations to deal with the guest write request and the host file system performs asynchronous operations to write the data to the hard disk. A prototype system is implemented on the Xen hypervisor and our experimental results verify that this new model has many advantages over the conventional asynchronous-synchronous model. We also evaluate the overhead of asynchronous I/O at host, which is brought by our new model. The result demonstrates that it enforces little cost on host layer. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
18. Heterogeneity-Aware Collective I/O for Parallel I/O Systems with Hybrid HDD/SSD Servers
- Author
-
Chuanhe Huang, Xian-He Sun, Shuibing He, Chenzhong Xu, and Yang Wang
- Subjects
Input/output ,020203 distributed computing ,Computer science ,business.industry ,Distributed computing ,02 engineering and technology ,Solid-state drive ,Parallel I/O ,020202 computer hardware & architecture ,Theoretical Computer Science ,Computational Theory and Mathematics ,Hardware and Architecture ,Server ,Middleware ,Multipath I/O ,Embedded system ,0202 electrical engineering, electronic engineering, information engineering ,Asynchronous I/O ,business ,Software ,Solaris Multiplexed I/O - Abstract
Collective I/O is a widely used middleware technique that exploits I/O access correlation among multiple processes to improve I/O system performance. However, most existing implementations of collective I/O strategies are designed and optimized for homogeneous I/O systems. In practice, the homogeneity assumptions do not hold in heterogeneous parallel I/O systems, which consist of multiple HDD and SSD-based servers and become increasingly promising. In this paper, we propose a heterogeneity-aware collective-I/O (HACIO) strategy to enhance the performance of conventional collective I/O operations. HACIO reorganizes the order of I/O requests for each aggregator with awareness of the storage performance of heterogeneous servers, so that the hardware of the systems can be better utilized. We have implemented HACIO in ROMIO, a widely used MPI-IO library. Experimental results show that HACIO can significantly increase the I/O throughputs of heterogeneous I/O systems.
- Published
- 2017
19. Enabling Transparent Asynchronous I/O using Background Threads
- Author
-
Suren Byna, Tonglin Li, Houjun Tang, John Mainzer, and Quincey Koziol
- Subjects
Data access ,Test data generation ,POSIX ,Computer science ,Asynchronous communication ,Interface (computing) ,Operating system ,Byte ,Asynchronous I/O ,computer.file_format ,Hierarchical Data Format ,computer.software_genre ,computer - Abstract
With scientific applications moving toward exascale levels, an increasing amount of data is being produced and analyzed. Providing efficient data access is crucial to the productivity of the scientific discovery process. Compared to improvements in CPU and network speeds, I/O performance lags far behind, such that moving data across the storage hierarchy can take longer than data generation or analysis. To alleviate this I/O bottleneck, asynchronous read and write operations have been provided by the POSIX and MPI-I/O interfaces and can overlap I/O operations with computation, and thus hide I/O latency. However, these standards lack support for non-data operations such as file open, stat, and close, and their read and write operations require users to both manually manage data dependencies and use low-level byte offsets. This requires significant effort and expertise for applications to utilize. To overcome these issues, we present an asynchronous I/O framework that provides support for all I/O operations and manages data dependencies transparently and automatically. Our prototype asynchronous I/O implementation as an HDF5 VOL connector demonstrates the effectiveness of hiding the I/O cost from the application with low overhead and easy-to-use programming interface.
- Published
- 2019
20. XLCS: A New Bit-Parallel Longest Common Subsequence Algorithm on Xeon Phi Clusters
- Author
-
Zekun Yin, Hao Zhang, Kai Xu, Yuandong Chan, Shaoliang Peng, Xiaoning Wang, Bertil Schmidt, and Weiguo Liu
- Subjects
Longest common subsequence problem ,Dynamic programming ,Speedup ,Computer science ,Computer cluster ,Asynchronous I/O ,Cache ,Supercomputer ,Algorithm ,Xeon Phi - Abstract
Finding the longest common subsequence (LCS) of two strings is a classical problem in bioinformatics. A basic approach to solve this problem is based on dynamic programming. As the biological sequence databases are growing continuously, bit-parallel sequence comparison algorithms are becoming increasingly important. In this paper, we present XLCS, a new parallel implementation to accelerate the LCS algorithm on Xeon Phi clusters by performing bit-wise operations. We have designed an asynchronous IO framework to improve the data transfer efficiency. To make full use of the computing resources of Xeon Phi clusters, we use three levels of parallelism: node-level, thread-level and vector-level. We also propose a segmentation-method to decrease cache misses. Our performance evaluation shows that XLCS achieves a peak performance of 3.61 TCUPS on a single 31S1P card and 8.20 TCUPS on a single 7210 card. Compared to the well-parallelized LCS algorithm OCS implemented on three M2090 cards, XLCS achieves a speedup of 3.6 on a KNC and 8.1 on a KNL, respectively. XLCS further achieves 93.84 TCUPS running 16 compute nodes of the Tianhe-2 supercomputer and 312.30 TCUPS with 32 compute nodes of a KNL cluster. To our knowledge, this is the first reported implementation of the bit-parallel LCS algorithm on Xeon Phi clusters. XLCS is available at https://github.com/wxiaoning/Longest-Common-Subsequence.
- Published
- 2019
21. Design of Asynchronous Non-block Server for Agricultural IOT
- Author
-
Lei Yu, Han Qiu, Yong Chang, and Jun Huai Li
- Subjects
Computer science ,business.industry ,Connection pool ,Node (networking) ,04 agricultural and veterinary sciences ,02 engineering and technology ,020202 computer hardware & architecture ,Upload ,Data acquisition ,Asynchronous communication ,040103 agronomy & agriculture ,0202 electrical engineering, electronic engineering, information engineering ,0401 agriculture, forestry, and fisheries ,Asynchronous I/O ,business ,Message queue ,Computer network ,Block (data storage) - Abstract
Aiming at the problems of collection, pretreatment and transmission of orchard environmental monitoring data, the thesis research and develop an environment intelligent acquisition node based on 4G network. In order to improve the concurrent processing ability of the server to upload data to the intelligent acquisition node, this thesis designs and develops a data acquisition server based on Asynchronous I/O (AIO) model, optimizes server performance in multiple aspects such as message queue, session connection pool and dynamic buffer, and improves communication efficiency. At the same time, this thesis tests the development of the acquisition server, using throughput and response time as the evaluation index to test the best and maximum number of concurrent processing of the server. Experiments show that the server performs well in dealing with high concurrent data upload requests, and it can better meet the application requirements of the system.
- Published
- 2019
22. Addressing Concurrency Design for HealthCare Web Service Gateway in Remote Healthcare Monitoring System
- Author
-
Lionel Nkenyereye and Jong-Wook Jang
- Subjects
Ajax ,Computer science ,business.industry ,Concurrency ,010401 analytical chemistry ,020206 networking & telecommunications ,02 engineering and technology ,Gateway (computer program) ,computer.software_genre ,01 natural sciences ,Web API ,0104 chemical sciences ,World Wide Web ,Remote healthcare ,Health care ,0202 electrical engineering, electronic engineering, information engineering ,Asynchronous I/O ,Web service ,business ,computer ,computer.programming_language - Published
- 2016
23. Performance Evaluation of Server-side JavaScript for Healthcare Hub Server in Remote Healthcare Monitoring System
- Author
-
Lionel Nkenyereye and Jong-Wook Jang
- Subjects
healthcare hub server ,server-side JavaScript ,Computer science ,Concurrency ,Internet of Things ,02 engineering and technology ,computer.software_genre ,JavaScript ,asynchronous I/O ,0202 electrical engineering, electronic engineering, information engineering ,concurrent application ,Server-side ,General Environmental Science ,Abstraction (linguistics) ,computer.programming_language ,remote healthcare monitoring system ,020206 networking & telecommunications ,020207 software engineering ,web services ,Asynchronous communication ,Operating system ,General Earth and Planetary Sciences ,Asynchronous I/O ,Web service ,Node.js ,computer - Abstract
With the help of a small wearable device, patients reside in an isolated village need constant monitoring which may increase access to care and decrease healthcare delivery cost. As the number of patients’ requests increases in simultaneously manner, the healthcare hub server located in the village hall encounters limitations for performing them successfully and concurrently. In this paper, we propose the design tasks of the Remote Healthcare Monitoring application for handling concurrency tasks. In the procedure of designing tasks, concurrency is best understood by employing multiple levels of abstraction. The way that is eminently to accomplish concurrency is to build an object-oriented environment with support for messages passing between concurrent objects. Node.js, a cross-platform runtime environment features technologies for handling concurrency issue efficiently for Remote Healthcare Monitoring System. The experiments results show that server-side JavaScript with Node.js and MongoDB as database is 40% faster than Apache Sling. With Node.js developers can build a high-performance, asynchronous, event-driven healthcare hub server to handle an increasing number of concurrent connections for Remote Healthcare Monitoring System in an isolated village with no access to local medical care.
- Published
- 2016
24. Understanding I/O Bottlenecks and Tuning for High Performance I/O on Large HPC Systems
- Author
-
Dong Ju Choi, Mahidhar Tatineni, Manu Shantharam, and Amitava Majumdar
- Subjects
Profiling (computer programming) ,NetCDF ,File system ,Computer science ,computer.file_format ,Supercomputer ,computer.software_genre ,01 natural sciences ,Parallel I/O ,010305 fluids & plasmas ,Computational science ,0103 physical sciences ,Comet (programming language) ,Asynchronous I/O ,Node (circuits) ,computer - Abstract
As we move towards peta-to-exascale machines, large-scale physics based simulations are expected to generate large amount of I/O traffic based on unprecedented growth in the volume and types of data. It is imperative to understand and characterize the I/O behavior of scientific applications, including complex checkpoint/restart options, on different hardware-software configurations including large shared parallel file systems, node local flash, and burst buffer technologies, to tune and improve the overall application performance. In this work, we study the I/O behavior of WRF, a widely used scientific application for atmospheric research and operational weather forecasting, on high performance computing systems. WRF provides a rich collection of I/O strategies such as using different parallel I/O libraries (PnetCDF, NetCDF) and I/O quilting options with these libraries, as well as configurable I/O "knobs" that can be used to modify the I/O frequency. We evaluate the effectiveness of using various I/O strategies within WRF in conjunction with parallel file system parameter tuning on Comet and Stampede2 HPC systems. We discuss the impact of using various parallel I/O strategies and further show the use of an I/O profiling tool to analyze an anomalous parallel I/O behavior. Overall, we provide a discussion on tuning and performance insights gained from our evaluations.
- Published
- 2018
25. Combining Phase Identification and Statistic Modeling for Automated Parallel Benchmark Generation
- Author
-
LiuQing, MaXiaosong, ChoiJong Youl, LoganJeremy, KlaskyScott, LiuMingliang, JinYe, and PodhorszkiNorbert
- Subjects
Computer Networks and Communications ,business.industry ,Computer science ,Phase (waves) ,Computer Graphics and Computer-Aided Design ,Identification (information) ,Software ,Computer engineering ,Hardware and Architecture ,Benchmark (computing) ,Asynchronous I/O ,business ,Statistic ,TRACE (psycholinguistics) - Abstract
Parallel application benchmarks are indispensable for evaluating/optimizing HPC software and hardware. However, it is very challenging and costly to obtain high-fidelity benchmarks reflecting the scale and complexity of state-of-the-art parallel applications. Hand-extracted synthetic benchmarks are time- and labor-intensive to create. Real applications themselves, while offering most accurate performance evaluation, are expensive to compile, port, reconfigure, and often plainly inaccessible due to security or ownership concerns. This work contributes APPrime, a novel tool for trace-based automatic parallel benchmark generation. Taking as input standard communication-I/O traces of an application's execution, it couples accurate automatic phase identification with statistical regeneration of event parameters to create compact, portable, and to some degree reconfigurable parallel application benchmarks. Experiments with four NAS Parallel Benchmarks (NPB) and three real scientific simulation codes confirm the fidelity of APPrime benchmarks. They retain the original applications' performance characteristics, in particular their relative performance across platforms. Also, the result benchmarks, already released online, are much more compact and easy-to-port compared to the original applications.
- Published
- 2015
26. Improving Scalability of Web Applications by Utilizing Asynchronous I/O
- Author
-
Gjorgji Rankovski and Ivan Chorbev
- Subjects
Syntax (programming languages) ,business.industry ,Computer science ,Distributed computing ,Process (computing) ,computer.software_genre ,Blocking (computing) ,World Wide Web ,Asynchronous communication ,Scalability ,Web application ,Asynchronous I/O ,Web service ,business ,computer - Abstract
The focus of the paper is the use of asynchronous I/O calls in web applications to improve their scalability, by increasing the number of requests per second that it can process and decreasing the average response time of the system. Popular development frameworks have always included only blocking I/O APIs in their base, making asynchronous I/O methods hard to implement and maintain. Significant effort has been made in recent years to enrich these frameworks with better syntax for asynchronous API to improve developers’ experience and encourage its use. Such improvement in .NET’s syntax is put to the test in this paper and the results are presented and evaluated.
- Published
- 2017
27. MERCURY: A Transparent Guided I/O Framework for High Performance I/O Stacks
- Author
-
James Morse, Federico Padua, Tim SuB, Giuseppe Congiu, Matthias Grawinkel, and André Brinkmann
- Subjects
File system ,POSIX ,Computer science ,Scalability ,Non-blocking I/O ,Operating system ,Network File System ,Asynchronous I/O ,Linux kernel ,Lustre (file system) ,computer.software_genre ,computer - Abstract
The performance gap between processors and I/O represents a serious scalability limitation for applications running on computing clusters. Parallel file systems often provide mechanisms that allow programmers to disclose their I/O pattern knowledge to the lower layers of the I/O stack through a hints API. This information can be used by the file system to boost the application performance. Unfortunately, programmers rarely make use of these features, missing the opportunity to exploit the full potential of the storage system. In this paper we propose MERCURY, a transparent guided I/O framework able to optimize file I/O patterns in scientific applications, allowing users to control the I/O behavior of applications without modifications. This is done by exploiting the hints API provided by the back-end file system to guide data prefetching. MERCURY effeciently converts numerous small read requests into a few larger requests. Furthermore, it increases the I/O bandwidth, reduces the number of I/O requests, and ultimately the application running time. Moreover, we also propose a Linux kernel modification that allows network file systems, specifically Lustre, to work with our guided I/O framework through the posix_fadvise interface.
- Published
- 2017
28. Measuring the Characteristics of Hypervisor I/O Scheduling in the Cloud for Virtual Machine Performance Interference
- Author
-
Ziye Yang, Yingjun Wu, Haifeng Fang, and Chunqi Li
- Subjects
Software_OPERATINGSYSTEMS ,I/O scheduling ,Computer Networks and Communications ,Hardware virtualization ,Computer science ,business.industry ,Hypervisor ,Cloud computing ,computer.software_genre ,Storage hypervisor ,Scheduling (computing) ,Virtual machine ,Embedded system ,Operating system ,Asynchronous I/O ,business ,computer - Abstract
In virtualized environments, the customers who purchase virtual machines (VMs) from a third-party cloud would expect that their VMs run in an isolated manner. However, the performance of a VM can be negatively affected by co-resident VMs. In this paper, the authors propose vExplorer, a distributed VM I/O performance measurement and analysis framework, where one can use a set of representative I/O operations to identify the I/O scheduling characteristics within a hypervisor, and potentially leverage this knowledge to carry out I/O based performance attacks to slow down the execution of the target VMs. The authors evaluate their prototype on both Xen and VMware platforms with four server benchmarks and show that vExplorer is practical and effective. The authors also conduct similar tests on Amazon’s EC2 platform and successfully slow down the performance of target VMs.
- Published
- 2013
29. A New Disk I/O Model of Virtualized Cloud Environment
- Author
-
Hai Jin, Bing Bing Zhou, Dingding Li, Xiaofei Liao, and Qi Zhang
- Subjects
File system ,Input/output ,business.industry ,Computer science ,Hypervisor ,Cloud computing ,Virtualization ,computer.software_genre ,Computational Theory and Mathematics ,Hardware and Architecture ,Asynchronous communication ,Embedded system ,Signal Processing ,Operating system ,Overhead (computing) ,Asynchronous I/O ,business ,computer ,Host (network) - Abstract
In a traditional virtualized cloud environment, using asynchronous I/O in the guest file system and synchronous I/O in the host file system to handle an asynchronous user disk write exhibits several drawbacks, such as performance disturbance among different guests and consistency maintenance across guest failures. To improve these issues, this paper introduces a novel disk I/O model for virtualized cloud system called HypeGear, where the guest file system uses synchronous operations to deal with the guest write request and the host file system performs asynchronous operations to write the data to the hard disk. A prototype system is implemented on the Xen hypervisor and our experimental results verify that this new model has many advantages over the conventional asynchronous-synchronous model. We also evaluate the overhead of asynchronous I/O at host, which is brought by our new model. The result demonstrates that it enforces little cost on host layer.
- Published
- 2013
30. Banzai+Tatoo: Using cutting-edge parsers for implementing high-performance servers
- Author
-
Gautier Loyauté, Gilles Roussel, Julien Cervelle, Rémi Forax, Laboratoire d'Algorithmique Complexité et Logique (LACL), Université Paris-Est Créteil Val-de-Marne - Paris 12 (UPEC UP12)-Centre National de la Recherche Scientifique (CNRS), Laboratoire d'Informatique Gaspard-Monge (LIGM), Centre National de la Recherche Scientifique (CNRS)-Fédération de Recherche Bézout-ESIEE Paris-École des Ponts ParisTech (ENPC)-Université Paris-Est Marne-la-Vallée (UPEM), Roussel, Gilles, and Université Paris-Est Marne-la-Vallée (UPEM)-École des Ponts ParisTech (ENPC)-ESIEE Paris-Fédération de Recherche Bézout-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Non-blocking IO ,Web server ,Java ,Computer science ,02 engineering and technology ,computer.software_genre ,Server ,Rule-based machine translation ,Protocol ,0202 electrical engineering, electronic engineering, information engineering ,Protocol (object-oriented programming) ,ComputingMilieux_MISCELLANEOUS ,computer.programming_language ,Parsing ,Software engineering ,Programming language ,020206 networking & telecommunications ,020207 software engineering ,TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,Memory footprint ,Operating system ,Asynchronous I/O ,computer ,Software - Abstract
This paper presents how the Tatoo parser generator enables the implementation of Java high-performance servers using the Banzai generic server shell. The performance of these servers relies on the ability of Tatoo to produce push non-blocking parsers with a fixed memory footprint during parsing and on the generic and efficient server architecture of Banzai. This approach reconciles the use of formally defined grammars for protocol parsing and the efficiency of the implementation. We argue that the use of the formal grammars simplifies the implementation of the protocol and we show that an HTTP server built using the Banzai+Tatoo is as efficient as several existing specially tuned high-performance HTTP servers.
- Published
- 2012
- Full Text
- View/download PDF
31. Optimizing Main Memory Usage in Modern Computing Systems to Improve Overall System Performance
- Author
-
Campello, Daniel Jose and Campello, Daniel Jose
- Abstract
Operating Systems use fast, CPU-addressable main memory to maintain an application’s temporary data as anonymous data and to cache copies of persistent data stored in slower block-based storage devices. However, the use of this faster memory comes at a high cost. Therefore, several techniques have been implemented to use main memory more efficiently in the literature. In this dissertation we introduce three distinct approaches to improve overall system performance by optimizing main memory usage. First, DRAM and host-side caching of file system data are used for speeding up virtual machine performance in today’s virtualized data centers. The clustering of VM images that share identical pages, coupled with data deduplication, has the potential to optimize main memory usage, since it provides more opportunity for sharing resources across processes and across different VMs. In our first approach, we study the use of content and semantic similarity metrics and a new algorithm to cluster VM images and place them in hosts where through deduplication we improve main memory usage. Second, while careful VM placement can improve memory usage by eliminating duplicate data, caches in current systems employ complex machinery to manage the cached data. Writing data to a page not present in the file system page cache causes the operating system to synchronously fetch the page into memory, blocking the writing process. In this thesis, we address this limitation with a new approach to managing page writes involving buffering the written data elsewhere in memory and unblocking the writing process immediately. This buffering allows the system to service file writes faster and with less memory resources. In our last approach, we investigate the use of emerging byte-addressable persistent memory technology to extend main memory as a less costly alternative to exclusively using expensive DRAM. We motivate and build a tiered memory system wherein persistent memory and DRAM co-exist and pr
- Published
- 2016
32. Remote MPI-I/O on a Parallel Virtual File System Using a Circular Buffer for High Throughput
- Author
-
Y. Tsujita
- Subjects
Unix ,File system ,Hardware_MEMORYSTRUCTURES ,Computer science ,Parallel computing ,computer.software_genre ,Computer Graphics and Computer-Aided Design ,Virtual file system ,Computer Science Applications ,Circular buffer ,Hardware and Architecture ,Operating system ,Asynchronous I/O ,Cache ,computer ,Throughput (business) ,Software ,Solaris Multiplexed I/O - Abstract
A flexible intermediate library named Stampi realizes seamless remote MPI-I/O operations on interconnected computers with the help of its MPI-I/O process which is invoked on a remote computer. The MPI-I/O process carries out I/O operations by using vendor's MPI-I/O library according to I/O requests from user processes. If the vendor's one is not available, UNIX I/O functions are used instead of it. A parallel virtual file system (PVFS) was supported in the remote MPI-I/O mechanism for data-intensive applications. Although this mechanism provided parallel I/O operations on a PVFS file system, its performance with UNIX I/O functions was low. Attempts to obtain high throughput have been made for this case with adopting a circular buffer mechanism in the MPI-I/O process to cache a part of or whole data. By optimizing configuration of the buffer, remote MPI-I/O operations with UNIX I/O functions outperformed those with direct calls of PVFS I/O functions on a PVFS file system.
- Published
- 2007
33. Combining Phase Identification and Statistic Modeling for Automated Parallel Benchmark Generation
- Author
-
Ye Jin, Qing Liu, Mingliang Liu, Norbert Podhorszki, Jong Youl Choi, Jeremy Logan, Xiaosong Ma, and Scott Klasky
- Subjects
business.industry ,Computer science ,Event (computing) ,Scale (chemistry) ,media_common.quotation_subject ,Fidelity ,Parallel computing ,computer.software_genre ,Identification (information) ,Software ,Benchmark (computing) ,Asynchronous I/O ,Compiler ,business ,computer ,TRACE (psycholinguistics) ,media_common - Abstract
Parallel application benchmarks are indispensable for evaluating/optimizing HPC software and hardware. However, it is very challenging and costly to obtain high-fidelity benchmarks reflecting the scale and complexity of state-of-the-art parallel applications. Hand-extracted synthetic benchmarks are time- and labor-intensive to create. Real applications themselves, while offering most accurate performance evaluation, are expensive to compile, port, reconfigure, and often plainly inaccessible due to security or ownership concerns. This work contributes APPrime, a novel tool for trace-based automatic parallel benchmark generation. Taking as input standard communication-I/O traces of an application's execution, it couples accurate automatic phase identification with statistical regeneration of event parameters to create compact, portable, and to some degree reconfigurable parallel application benchmarks. Experiments with four NAS Parallel Benchmarks (NPB) and three real scientific simulation codes confirm the fidelity of APPrime benchmarks. They retain the original applications' performance characteristics, in particular their relative performance across platforms. Also, the result benchmarks, already released online, are much more compact and easy-to-port compared to the original applications.
- Published
- 2015
34. Asynchronous I/O
- Author
-
Jeff Friesen
- Subjects
Computer science ,business.industry ,Channel (programming) ,Server ,Scalability ,Non-blocking I/O ,Thread pool ,Code (cryptography) ,Asynchronous I/O ,Data_CODINGANDINFORMATIONTHEORY ,business ,Multiplexing ,Computer network - Abstract
NIO provides multiplexed I/O (a combination of nonblocking I/O, discussed in Chapter 7, and readiness selection, discussed in Chapter 8) to facilitate the creation of highly scalable servers. Client code registers a socket channel with a selector to be notified when the channel is ready to start I/O.
- Published
- 2015
35. Scalable Design and Implementations for MPI Parallel Overlapping I/O
- Author
-
E. Russell, Lee Ward, N. Pundit, Alok Choudhary, Wei-keng Liao, and Kenin Coloma
- Subjects
Computer science ,Serialization ,Message Passing Interface ,Parallel computing ,computer.software_genre ,File locking ,File server ,Server ,Versioning file system ,File system fragmentation ,Input/output ,File system ,Atomicity ,File descriptor ,Message passing ,Device file ,Unix file types ,Memory-mapped file ,File Control Block ,Self-certifying File System ,Computational Theory and Mathematics ,Hardware and Architecture ,POSIX ,Signal Processing ,Asynchronous I/O ,Cache ,computer ,Cache coherence - Abstract
We investigate the message passing interface input/output (MPI I/O) implementation issues for two overlapping access patterns: the overlaps among processes within a single I/O operation and the overlaps across a sequence of I/O operations. The former case considers whether I/O atomicity can be obtained in the overlapping regions. The latter focuses on the file consistency problem on parallel machines with client-side file caching enabled. Traditional solutions for both overlapping I/O problems use whole file or byte-range file locking to ensure exclusive access to the overlapping regions and bypass the file system cache. Unfortunately, not only can file locking serialize I/O, but it can also increase the aggregate communication overhead between clients and I/O servers. For atomicity, we first differentiate MPI's requirements from the portable operating system interface (POSIX) standard and propose two scalable approaches, graph coloring and process-rank ordering, which can resolve access conflicts and maintain I/O parallelism. For solving the file consistency problem across multiple I/O operations, we propose a method called persistent file domains, which tackles cache coherency with additional information and coordination to guarantee safe cache access without using file locks
- Published
- 2006
36. Evaluating network processing efficiency with processor partitioning and asynchronous I/O
- Author
-
JanakiramanG. (John), LynnBrian, SaletoreVikram, BrechtTim, and TurnerYoshio
- Subjects
Internet protocol suite ,business.industry ,computer.internet_protocol ,Computer science ,Network processing ,Ip processing ,General Earth and Planetary Sciences ,Multiprocessing ,Asynchronous I/O ,business ,computer ,General Environmental Science ,Computer network - Abstract
Applications requiring high-speed TCP/IP processing can easily saturate a modern server. We and others have previously suggested alleviating this problem in multiprocessor environments by dedicating a subset of the processors to perform network packet processing. The remaining processors perform only application computation, thus eliminating contention between these functions for processor resources. Applications interact with packet processing engines (PPEs) using an asynchronous I/O (AIO) programming interface which bypasses the operating system. A key attraction of this overall approach is that it exploits the architectural trend toward greater thread-level parallelism in future systems based on multi-core processors. In this paper, we conduct a detailed experimental performance analysis comparing this approach to a best-practice configured Linux baseline system.We have built a prototype system implementing this architecture, ETA+AIO (Embedded Transport Acceleration with Asynchronous I/O), and ported a high-performance web-server to the AIO interface. Although the prototype uses modern single-core CPUs instead of future multi-core CPUs, an analysis of its performance can reveal important properties of this approach. Our experiments show that the ETA+AIO prototype has a modest advantage over the baseline Linux system in packet processing efficiency, consuming fewer CPU cycles to sustain the same throughput. This efficiency advantage enables the ETA+AIO prototype to achieve higher peak throughput than the baseline system, but only for workloads where the mix of packet processing and application processing approximately matches the allocation of CPUs in the ETA+AIO system thereby enabling high utilization of all the CPUs. Detailed analysis shows that the efficiency advantage of the ETA+AIO prototype, which uses one PPE CPU, comes from avoiding multiprocessing overheads in packet processing, lower overhead of our AIO interface compared to standard sockets, and reduced cache misses due to processor partitioning.
- Published
- 2006
37. Enabling Efficient Communications with Session Multipathing
- Author
-
Bruno Yuji Lino Kimura and Israel Luiz Borges Ribeiro
- Subjects
POSIX Threads ,Session layer ,Computer science ,Operating system ,CPU time ,Asynchronous I/O ,Session (computer science) ,epoll ,computer.software_genre ,computer ,Multipath TCP ,PATH (variable) - Abstract
In this paper we report an investigation on potential session multipathing strategies that leverage the easy deployment of user space-based protocols. We devised an experimental session layer architecture to deal with multipathing and we implemented such architecture with 4 different libraries of asynchronous I/O processing for Linux systems: Epoll, Posix Threads, Posix AIO, and Libev. We discuss the evaluation of these implementations (a) by comparing them with the general kernel space solution, Multipath TCP (MPTCP), in an emulated network, and (b) by determining both the performance gain factor and the cost of resource consumption as function of the number of paths in a session. We found that Libev API, a full-featured and high-performance event loop, applied to the session multipathing enables an average good put gain factor of 1.62 faster per path added in a session, while its counterpart is of 2.23% of CPU utilization per path and it requires no more than 4 MB of RAM regardless the number of paths. We also observed that Libev-based multipathing allows overall efficiency slightly higher than MPTCP.
- Published
- 2014
38. Afluentes Concurrent I/O Made Easy with Lazy Evaluation
- Author
-
Nelson Souto Rosa, Saulo Medeiros de Araujo, Silvio Romero de Lemos Meira, and Kiev Gama
- Subjects
Java ,Asynchronous communication ,Computer science ,Distributed computing ,Concurrency ,Function composition (computer science) ,Callback ,Asynchronous I/O ,Context (language use) ,Parallel computing ,Lazy evaluation ,computer ,computer.programming_language - Abstract
I/O intensive systems can significantly reduce their total execution time by performing these operations concurrently. Despite this enormous potential, most systems perform I/O operations sequentially. One of the reasons behind this is that the most widespread mechanisms for concurrent I/O are callback-based, resulting in code hard to write and maintain. In this context, we propose Afluentes, a Java framework that allows asynchronous functions to be composed just as easily as synchronous ones, facilitating the development of concurrent I/O intensive systems. We performed an experimental evaluation whose results showed that Afluentes leads to significant performance gains over sequential I/O. Compared to callback-based approaches, Afluentes provides similar increases in performance, but with less programming effort.
- Published
- 2014
39. Survey on System I/O Hardware Transactions and Impact on Latency, Throughput, and Other Factors
- Author
-
Ben Lee and Steen Larsen
- Subjects
Input/output ,I/O Acceleration Technology ,business.industry ,Computer science ,Embedded system ,Multipath I/O ,Asynchronous I/O ,Memory-mapped I/O ,Central processing unit ,business ,Direct memory access ,Computer hardware ,Channel I/O - Abstract
Computer system input/output (I/O) has evolved with processor and memory technologies in terms of reducing latency, increasing bandwidth, and other factors. As requirements increase for I/O, such as networking, storage, and video, descriptor-based direct memory access (DMA) transactions have become more important in high-performance systems to move data between I/O adapters and system memory buffers. DMA transactions are done with hardware engines below the software protocol abstraction layers in all systems other than rudimentary embedded controllers. Central processing unit (CPUs) can switch to other tasks by offloading hardware DMA transfers to the I/O adapters. Each I/O interface has one or more separately instantiated descriptor-based DMA engines optimized for a given I/O port. I/O transactions are optimized by accelerator functions to reduce latency, improve throughput, and reduce CPU overhead. This chapter surveys the current state of high-performance I/O architecture advances and explores benefits and limitations. With the proliferation of CPU multicores within a system, multi-GB/s ports, and on-die integration of system functions, changes beyond the techniques surveyed may be needed for optimal I/O architecture performance.
- Published
- 2014
40. TAI : threaded asynchronous I/O library for performance and portability
- Author
-
Liao, Tongliang
- Subjects
- Asynchronous I/O, C++17, Lock-free queue, Benchmark
- Abstract
In this paper, we investigate the behavior and performance of disk I/O using different types of libraries. We analyze the scenario where we can benefit from asynchronous I/O, and propose our cross-platform library design called TAI (Threaded Async I/O). TAI is designed to be a C++17 library with developer-friendly API. Our benchmark shows it can out-perform other libraries when asynchronous I/O is beneficial, and keep competitive speed in other cases. It also demonstrates TAI's ability to retrieve 20% - 60% speedup on poorly scaled serial code by a simple library replacement.
- Published
- 2017
41. Scalable Service Composition Execution through Asynchronous I/O
- Author
-
Pierpaolo Baglietto, Michele Stecca, Massimo Maresca, and Martino Fornasa
- Subjects
Service (systems architecture) ,Computer science ,business.industry ,Software as a service ,Distributed computing ,Cloud computing ,computer.software_genre ,Utility computing ,Asynchronous communication ,Asynchronous I/O ,Mashup ,Data as a service ,business ,computer - Abstract
In the last few years different solutions have been proposed for the composition of Web APIs. In this paper we focus on the scalability problems appearing when the software platform in charge of executing Service Compositions (which are defined as Directed Acyclic Graphs, DAGs) is supposed to support huge numbers of concurrent executions. This is the case of viral applications as well as of Cloud Computing scenarios where Service Compositions are deployed and executed in multi-tentant platforms implementing different paradigms such as Business Process as a Service, Mashup as a Service, Service Composition as a Service. The proposed solution exploits the Asynchronous I/O paradigm for the efficient utilization of system resources such as threads and memory.
- Published
- 2013
42. Optimizing virtual machine live storage migration in heterogeneous storage environment
- Author
-
Ruijin Zhou, Chao Li, Tao Li, and Fang Liu
- Subjects
Computer science ,Virtual machine ,business.industry ,Converged storage ,Embedded system ,Redundancy (engineering) ,Asynchronous I/O ,Cloud computing ,computer.software_genre ,business ,Virtualization ,Solid-state drive ,computer - Abstract
Virtual machine (VM) live storage migration techniques significantly increase the mobility and manageability of virtual machines in the era of cloud computing. On the other hand, as solid state drives (SSDs) become increasingly popular in data centers, VM live storage migration will inevitably encounter heterogeneous storage environments. Nevertheless, conventional migration mechanisms do not consider the speed discrepancy and SSD's wear-out issue, which not only causes significant performance degradation but also shortens SSD's lifetime. This paper, for the first time, addresses the efficiency of VM live storage migration in heterogeneous storage environments from a multi-dimensional perspective, i.e., user experience, device wearing, and manageability. We derive a flexible metric (migration cost), which captures various design preference. Based on that, we propose and prototype three new storage migration strategies, namely: 1) Low Redundancy (LR), which generates the least amount of redundant writes; 2) Source-based Low Redundancy (SLR), which keeps the balance between IO performance and write redundancy; and 3) Asynchronous IO Mirroring, which seeks the highest IO performance. The evaluation of our prototyped system shows that our techniques outperform existing live storage migration by a significant margin. Furthermore, by adaptively mixing our proposed schemes, the cost of massive VM live storage migration can be even lower than that of only using the best of individual mechanism.
- Published
- 2013
43. Comparative Analysis of Asynchronous I/O in Multithreaded UNIX
- Author
-
H. Chuck Yoo
- Subjects
Unix ,Computer science ,Multiprocessing ,Thread (computing) ,Parallel computing ,computer.software_genre ,Asynchronous communication ,Multithreading ,Operating system ,Overhead (computing) ,Asynchronous I/O ,computer ,Software ,Block (data storage) - Abstract
SUMMARY UO operations in UNIX are inherently synchronous. The need for asynchronous UO comes first from multithreaded applications where thrads cannot block for UO, and second from the fact that asynchronous VO has much less overhead than synchronous UO. There are two main approaches to accomplishing asynchronous UO in UNIX. We compare the two approaches in design and implementation, and report the results of extensive experiments to measure the performance differences.
- Published
- 1996
44. Input and Output
- Author
-
Walter Spector and Norman S. Clerman
- Subjects
Computer science ,Programming language ,Fortran ,Asynchronous I/O ,Logical unit number ,computer.software_genre ,computer ,computer.programming_language ,Scientific software - Published
- 2011
45. Interval Based I/O: A New Approach to Providing High Performance Parallel I/O
- Author
-
Phillip M. Dickens and Jeremy Logan
- Subjects
Input/output ,Parallel processing (DSP implementation) ,Computer science ,POSIX ,Distributed computing ,Scalability ,Data-intensive computing ,Interval (graph theory) ,Asynchronous I/O ,Parallel computing ,Supercomputer ,Parallel I/O - Abstract
Providing scalable, high-performance parallel I/O for data-intensive computations is beset by a number of difficult challenges. The most often cited difficulties include the non-contiguous I/O patterns prominent in scientific codes, the lack of support for parallel I/O optimizations in POSIX, the high cost of providing strict file consistency semantics, and the cost of accessing storage devices over a network. We believe, however, that a more fundamental problem is the legacy view of a file as a linear sequence of bytes. To address this issue, we are developing a new approach to parallel I/O that is based on what we term intervals and interval files. This paper provides an overview of the interval-IO system and a set of benchmarks demonstrating the power of this new approach.
- Published
- 2011
46. Scaling Instant Messaging communication services: A comparison of blocking and non-blocking techniques
- Author
-
Dmitri Botvich, Kieran Ryan, Leigh Griffin, and Eamonn de Leastar
- Subjects
Service (systems architecture) ,Java ,Computer science ,Scala ,Concurrency ,Distributed computing ,Interoperability ,0102 computer and information sciences ,02 engineering and technology ,JavaScript ,computer.software_genre ,01 natural sciences ,World Wide Web ,Concurrency control ,Server ,0202 electrical engineering, electronic engineering, information engineering ,Concurrent computing ,computer.programming_language ,business.industry ,020206 networking & telecommunications ,Walton Institute for Information and Communications Systems Science ,Executor ,Blocking (computing) ,010201 computation theory & mathematics ,Virtual machine ,Communications Infrastructure Management ,Asynchronous I/O ,Telecommunications Software and Systems Group ,business ,computer ,Software ,Computer network - Abstract
Designing innovative communications services that scale to facilitate potential new usage patterns can pose significant challenges. This is particularly the case if these services are to be delivered over existing protocols and interoperate with legacy services. This work explores design choices for such a service: large scale message delivery to existing Instant Messaging users. In particular we explore message throughput, accuracy and server load for several alternative implementation strategies. These strategies focus on approaches to concurrency, with best practice in current and emerging techniques thoroughly benchmarked. Specifically, a conventional Java Executor approach is compared with a functional approach realised through Scala and its Actors framework. These could be termed “blocking I/O” technology. A third approach has also been measured - a “non-blocking I/O” based on an alternative to Java Virtual Machine approaches - employing Node.js and Javascript. We believe that some of the results are startling.
- Published
- 2011
47. AC: Composable asynchronous io for native languages
- Author
-
Ross McIlroy, Tim Harris, Rebecca Isaacs, and Martín Abadi
- Subjects
Programming language ,Computer science ,Message passing ,020207 software engineering ,02 engineering and technology ,Thread (computing) ,computer.software_genre ,Computer Graphics and Computer-Aided Design ,Operational semantics ,Asynchronous communication ,0202 electrical engineering, electronic engineering, information engineering ,Programming paradigm ,Callback ,020201 artificial intelligence & image processing ,Asynchronous I/O ,Implementation ,computer ,Core language ,Language construct ,Software - Abstract
This paper introduces AC, a set of language constructs for composable asynchronous IO in native languages such as C/C++. Unlike traditional synchronous IO interfaces, AC lets a thread issue multiple IO requests so that they can be serviced concurrently, and so that long-latency operations can be overlapped with computation. Unlike traditional asynchronous IO interfaces, AC retains a sequential style of programming without requiring code to use multiple threads, and without requiring code to be "stack-ripped" into chains of callbacks. AC provides an "async" statement to identify opportunities for IO operations to be issued concurrently, a "do..finish" block that waits until any enclosed "async" work is complete, and a "cancel" statement that requests cancellation of unfinished IO within an enclosing "do..finish". We give an operational semantics for a core language. We describe and evaluate implementations that are integrated with message passing on the Barrelfish research OS, and integrated with asynchronous file and network IO on Microsoft Windows. We show that AC offers comparable performance to existing C/C++ interfaces for asynchronous IO, while providing a simpler programming model.
- Published
- 2011
- Full Text
- View/download PDF
48. Boost C++ Libraries
- Author
-
Sandeep Koranne
- Subjects
ComputingMethodologies_PATTERNRECOGNITION ,Programming language ,Computer science ,Graph traversal ,Memory pool ,Smart pointer ,Asynchronous I/O ,Python (programming language) ,computer.software_genre ,Data structure ,computer ,computer.programming_language - Abstract
In this chapter we discuss the Boost C++ API. Boost is a peer-reviewed C++ class library which implements many interesting and useful data structures and algorithms. In particular we discuss the use of Boost smart pointers, Boost asynchronous IO, and IO Streams. Boost also implements many data structures which are not present in the C++ standard library (e.g. bimap). Boost Graph Library (BGL) is presented with the help of real-life example. We compare Boost multi-threading and memory pool performance to APR. We discuss the integration of Python with C++ using Boost. We conclude the chapter with a discussion of Boost Generic Image Processing Library.
- Published
- 2010
49. Automated Tracing of I/O Stack
- Author
-
Mahmut Kandemir, Seung Woo Son, Wei-keng Liao, Ramya Prabhakar, Yuanrui Zhang, Christina M. Patrick, Seong Jo Kim, and Alok Choudhary
- Subjects
Hierarchy (mathematics) ,Computer science ,business.industry ,Device file ,Tracing ,computer.software_genre ,Parallel I/O ,Stack (abstract data type) ,Computer data storage ,Operating system ,Asynchronous I/O ,business ,computer ,TRACE (psycholinguistics) - Abstract
Efficient execution of parallel scientific applications requires high-performance storage systems designed to meet their I/O requirements. Most high-performance I/O intensive applications access multiple layers of the storage stack during their disk operations. A typical I/O request from these applications may include accesses to high-level libraries such as MPI I/O, executing on clustered parallel file systems like PVFS2, which are in turn supported by native file systems like Linux. In order to design and implement parallel applications that exercise this I/O stack, it is important to understand the flow of I/O calls through the entire storage system. Such understanding helps in identifying the potential performance and power bottlenecks in different layers of the storage hierarchy. To trace the execution of the I/O calls and to understand the complex interactions of multiple user-libraries and file systems, we propose an automatic code instrumentation technique, which enables us to collect detailed statistics of the I/O stack. Our proposed I/O tracing tool traces the flow of I/O calls across different layers of an I/O stack, and can be configured to work with different file systems and user-libraries. It also analyzes the collected information to generate output in terms of different user-specified metrics of interest.
- Published
- 2010
50. CellMR: A framework for supporting mapreduce on asymmetric cell-based clusters
- Author
-
Dimitrios S. Nikolopoulos, Benjamin Rose, Ali R. Butt, and M. Mustafa Rafique
- Subjects
Multi-core processor ,Speedup ,Computer science ,Distributed computing ,Multiple buffering ,Parallel computing ,Supercomputer ,Scheduling (computing) ,Computational Theory and Mathematics ,Hardware and Architecture ,Asynchronous communication ,Scalability ,Resource allocation ,Resource management ,Asynchronous I/O ,Software - Abstract
The use of asymmetric multi-core processors with on-chip computational accelerators is becoming common in a variety of environments ranging from scientific computing to enterprise applications. The focus of current research has been on making efficient use of individual systems, and porting applications to asymmetric processors. In this paper, we take the next step by investigating the use of multi-core-based systems, especially the popular Cell processor, in a cluster setting. We present CellMR, an efficient and scalable implementation of the MapReduce framework for asymmetric Cell-based clusters. The novelty of CellMR lies in its adoption of a streaming approach to supporting MapReduce, and its adaptive resource scheduling schemes: Instead of allocating workloads to the components once, CellMR slices the input into small work units and streams them to the asymmetric nodes for efficient processing. Moreover, CellMR removes I/O bottlenecks by design, using a number of techniques, such as double-buffering and asynchronous I/O, to maximize cluster performance. Our evaluation of CellMR using typical MapReduce applications shows that it achieves 50.5% better performance compared to the standard nonstreaming approach, introduces a very small overhead on the manager irrespective of application input size, scales almost linearly with increasing number of compute nodes (a speedup of 6.9 on average, when using eight nodes compared to a single node), and adapts effectively the parameters of its resource management policy between applications with varying computation density.
- Published
- 2009
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.