Author: "Racah, Evan" / Database: eScholarship - Searchworks@Jio Institute Digital Library Search Results

1. Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies:

Author: Gittens, Alex, Devarakonda, Aditya, Racah, Evan, Ringenburg, Michael, Gerhardt, Lisa, Kottaalam, Jey, Liu, Jialin, Maschhoff, Kristyn, Canon, Shane, Chhugani, Jatin, Sharma, Pramod, Yang, Jiyan, Demmel, James, Harrell, Jim, Krishnamurthy, Venkat, Mahoney, Michael W., and Prabhat, Mr
Abstract: We explore the trade-offs of performing linear algebra using Apache Spark, compared to traditional C and MPI implementations on HPC platforms. Spark is designed for data analytics on cluster computing platforms with access to local disks and is optimized for data-parallel tasks. We examine three widely-used and important matrix factorizations: NMF (for physical plausibility), PCA (for its ubiquity) and CX (for data interpretability). We apply these methods to TB-sized problems in particle physics, climate modeling and bioimaging. The data matrices are tall-and-skinny which enable the algorithms to map conveniently into Spark's data-parallel model. We perform scaling experiments on up to 1600 Cray XC40 nodes, describe the sources of slowdowns, and provide tuning guidance to obtain high performance.
Published: 2016

2. PANDA: Extreme Scale Parallel K-Nearest Neighbor on Distributed Architectures:

Author: Patwary, Md. Mostofa Ali, Satish, Nadathur, Sundaram, Narayanan, Liu, Jialin, Sadowski, Peter, Racah, Evan, Byna, Suren, Tull, Craig, Bhimji, Wahid, Mr., Prabhat, and Dubey, Pradeep
Abstract: Computing k-Nearest Neighbors (KNN) is one of the core kernels used in many machine learning, data mining and scientific computing applications. Although kd-tree based O(log n) algorithms have been proposed for computing KNN, due to its inherent sequentiality, linear algorithms are being used in practice. This limits the applicability of such methods to millions of data points, with limited scalability for Big Data analytics challenges in the scientific domain. In this paper, we present parallel and highly optimized kd-tree based KNN algorithms (both construction and querying) suitable for distributed architectures. Our algorithm includesnovel approaches for pruning search space and improving load balancing and partitioning among nodes and threads. Using TB-sized datasets from three science applications: astrophysics, plasma physics, and particle physics, we show that our implementation can construct kd-tree of 189 billion particles in 48 seconds on utilizing 50,000 cores. We also demonstrate computation of KNN of 19 billion queries in 12 seconds. We demonstrate almost linear speedup both for shared and distributed memory computers. Our algorithms outperforms earlier implementations by more than order of magnitude; thereby radically improving the applicability of our implementation to state-of-the-art Big Data analytics problems.
Published: 2016

3. A multi-platform evaluation of the randomized CX low-rank matrix factorization in Spark:

Author: Gittens, Alex, Kottalam, Jey, Yang, Jiyan, Ringenburg, Michael, F., Chhugani, Jatin, Racah, Evan, Singh, Mohitdeep, Yao, Yushu, Fischer, Curt, Ruebel, Oliver, Bowen, Benjamin, Lewis, Norman, G., Mahoney, Michael, W., Krishnamurthy, Venkat, and Prabhat, Mr
Abstract: We investigate the performance and scalability of the randomized CX low-rank matrix factorization and demonstrate its applicability through the analysis of a 1TB mass spectrometry imaging (MSI) dataset, using Apache Spark on an Amazon EC2 cluster, a Cray XC40 system, and an experimental Cray cluster. We implemented this factorization both as a parallelized C implementation with hand-tuned optimizations and in Scala using the Apache Spark high-level cluster computing framework. We obtained consistent performance across the three platforms: using Spark we were able to process the 1TB size dataset in under 30 minutes with 960 cores on all systems, with the fastest times obtained on the experimental Cray cluster. In comparison, the C implementation was 21X faster on the Amazon EC2 system, due to careful cache optimizations, bandwidth-friendly access of matrices and vector computation using SIMD units. We report these results and their implications on the hardware and software issues arising in supporting data-centric workloads in parallel and distributed environments.
Published: 2016

4. Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies:

Author: Gittens, Alex, Devarakonda, Aditya, Racah, Evan, Ringenburg, Michael, Gerhardt, Lisa, Kottalam, Jey, Liu, Jialin, Maschhoff, Kristyn, Canon, Shane, Chhugani, Jatin, Sharma, Pramod, Yang, Jiyan, Demmel, James, Harrell, Jim, Krishnamurthy, Venkat, Mahoney, Michael, and Prabhat, Mr
Abstract: We explore the trade-offs of performing linear algebra using Apache Spark, compared to traditional C and MPI implementations on HPC platforms. Spark is designed for data analytics on cluster computing platforms with access tolocal disks and is optimized for data-parallel tasks. We examine three widely-used and important matrix factorizations: NMF (for physical plausibility), PCA (for its ubiquity) and CX (for data interpretability). We apply these methods to 1.6TB particle physics, 2.2TB and 16TB climate modeling and 1.1TB bioimaging data. The data matrices are tall-and-skinny which enable the algorithms to map conveniently into Spark’s data parallel model. We perform scaling experiments on up to 1600 Cray XC40 nodes, describe the sources of slowdowns, and provide tuning guidance to obtain high performance.
Published: 2016

5. PANDA: Extreme Scale Parallel K-Nearest Neighbor on Distributed Architectures.

Author: Patwary, Md Mostofa Ali, Satish, Nadathur Rajagopalan, Sundaram, Narayanan, Liu, Jialin, Sadowski, Peter J, Racah, Evan, Byna, Surendra, Tull, Craig, Bhimji, Wahid, Prabhat, and Dubey, Pradeep
Published: 2016

6. PANDA: Extreme Scale Parallel K-Nearest Neighbor on Distributed Architectures.

Author: Patwary, Md Mostofa Ali, Satish, Nadathur Rajagopalan, Sundaram, Narayanan, Liu, Jialin, Sadowski, Peter J, Racah, Evan, Byna, Surendra, Tull, Craig, Bhimji, Wahid, Prabhat, and Dubey, Pradeep
Published: 2016

7. Revealing Fundamental Physics from the Daya Bay Neutrino Experiment using Deep Neural Networks

Author: Racah, Evan, Ko, Seyoon, Sadowski, Peter, Bhimji, Wahid, Tull, Craig, Oh, Sang-Yun, Baldi, Pierre, and Prabhat
Subjects: Deep Learning, Unsupervised Learning, High-Energy Physics, Autoencoders, stat.ML, cs.LG, physics.data-an
Abstract: Experiments in particle physics produce enormous quantities of data that must be analyzed and interpreted by teams of physicists. This analysis is often exploratory, where scientists are unable to enumerate the possible types of signal prior to performing the experiment. Thus, tools for summarizing, clustering, visualizing and classifying high-dimensional data are essential. In this work, we show that meaningful physical content can be revealed by transforming the raw data into a learned high-level representation using deep neural networks, with measurements taken at the Daya Bay Neutrino Experiment as a case study. We further show how convolutional deep neural networks can provide an effective classification filter with greater than 97% accuracy across different classes of physics events, significantly better than other machine learning approaches.
Published: 2016

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

7 results on '"Racah, Evan"'

1. Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies:

2. PANDA: Extreme Scale Parallel K-Nearest Neighbor on Distributed Architectures:

3. A multi-platform evaluation of the randomized CX low-rank matrix factorization in Spark:

4. Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies:

5. PANDA: Extreme Scale Parallel K-Nearest Neighbor on Distributed Architectures.

6. PANDA: Extreme Scale Parallel K-Nearest Neighbor on Distributed Architectures.

7. Revealing Fundamental Physics from the Daya Bay Neutrino Experiment using Deep Neural Networks

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Publication Type

Database

7 results on '"Racah, Evan"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources