Back to Search Start Over

HFAA

Authors :
Jeffrey Shafer
Adam Yee
Source :
Proceedings of the 2nd Workshop on Architectures and Systems for Big Data.
Publication Year :
2012
Publisher :
ACM, 2012.

Abstract

Hadoop is an open-source implementation of the MapReduce programming model for distributed computing. Hadoop natively integrates with the Hadoop Distributed File System (HDFS), a user-level file system. In this paper, we introduce the Hadoop Filesystem Agnostic API (HFAA) to allow Hadoop to integrate with any distributed file system over TCP sockets. With this API, HDFS can be replaced by distributed file systems such as PVFS, Ceph, Lustre, or others, thereby allowing direct comparisons in terms of performance and scalability. Unlike previous attempts at augmenting Hadoop with new file systems, the socket API presented here eliminates the need to customize Hadoop's Java implementation, and instead moves the implementation responsibilities to the file system itself. Thus, developers wishing to integrate their new file system with Hadoop are not responsible for understanding details of Hadoop's internal operation.In this paper, an initial implementation of HFAA is used to replace HDFS with PVFS, a file system popular in high-performance computing environments. Compared with an alternate method of integrating with PVFS (a POSIX kernel interface), HFAA increases write and read throughput by 23% and 7%, respectively.

Details

Database :
OpenAIRE
Journal :
Proceedings of the 2nd Workshop on Architectures and Systems for Big Data
Accession number :
edsair.doi...........a6b0cfd006ee2cac25e3ebc8ec0995b3
Full Text :
https://doi.org/10.1145/2379436.2379439