Start Over

Automated Parallel Data Processing Engine with Application to Large-Scale Feature Extraction

Authors :: Jonathan B. Ajo-Franklin
Kesheng Wu
Bin Dong
Xin Xing
Source :: 2018 IEEE/ACM Machine Learning in HPC Environments (MLHPC).
Publication Year :: 2018
Publisher :: IEEE, 2018.
Abstract: As new scientific instruments generate ever more data, we need to parallelize advanced data analysis algorithms such as machine learning to harness the available computing power. The success of commercial Big Data systems demonstrated that it is possible to automatically parallelize many algorithms. However, these Big Data tools have trouble handling the complex analysis operations from scientific applications. To overcome this difficulty, we have started to build an automated parallel data processing engine for science, known as ARRAYUDF. This paper provides an overview of this data processing engine, and a use case involving a feature extraction task from a large-scale seismic recording technology, called distributed acoustic sensing (DAS). The key challenge associated with DAS data sets is that they are vast in volume and noisy in data quality. The existing methods used by the DAS team for extracting useful signals like traveling seismic waves are complex and very time-consuming. Our parallel data processing engine reduces the job execution time from 10s of hours to 10s of seconds, and achieves 95% parallelization efficiency. ARRAYUDF could be used to implement more advanced data processing algorithms including machine learning, and could work with many more applications.

Subjects :: Scientific instrument
Data processing
Task (computing)
Computer engineering
Computer science
business.industry
Data quality
Feature extraction
Big data
Key (cryptography)
Distributed acoustic sensing
business

Details

Database :: OpenAIRE
Journal :: 2018 IEEE/ACM Machine Learning in HPC Environments (MLHPC)
Accession number :: edsair.doi...........a6a292c55dc908514e0134a4f3481fa8
Full Text :: https://doi.org/10.1109/mlhpc.2018.8638638

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Automated Parallel Data Processing Engine with Application to Large-Scale Feature Extraction

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Automated Parallel Data Processing Engine with Application to Large-Scale Feature Extraction

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources