Author: "Park, Jongsoo" / Publisher: arxiv - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Park, Jongsoo"' showing total 8 results

Start Over Author "Park, Jongsoo" Publisher arxiv

8 results on '"Park, Jongsoo"'

1. MTrainS: Improving DLRM training efficiency using heterogeneous memories

Author: Kassa, Hiwot Tadese, Johnson, Paul, Akers, Jason, Ghosh, Mrinmoy, Tulloch, Andrew, Mudigere, Dheevatsa, Park, Jongsoo, Liu, Xing, Dreslinski, Ronald, and Ardestani, Ehsan K.
Subjects: Performance (cs.PF), FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Performance, Information Retrieval (cs.IR), Computer Science - Information Retrieval, Machine Learning (cs.LG)
Abstract: Recommendation models are very large, requiring terabytes (TB) of memory during training. In pursuit of better quality, the model size and complexity grow over time, which requires additional training data to avoid overfitting. This model growth demands a large number of resources in data centers. Hence, training efficiency is becoming considerably more important to keep the data center power demand manageable. In Deep Learning Recommendation Models (DLRM), sparse features capturing categorical inputs through embedding tables are the major contributors to model size and require high memory bandwidth. In this paper, we study the bandwidth requirement and locality of embedding tables in real-world deployed models. We observe that the bandwidth requirement is not uniform across different tables and that embedding tables show high temporal locality. We then design MTrainS, which leverages heterogeneous memory, including byte and block addressable Storage Class Memory for DLRM hierarchically. MTrainS allows for higher memory capacity per node and increases training efficiency by lowering the need to scale out to multiple hosts in memory capacity bound use cases. By optimizing the platform memory hierarchy, we reduce the number of nodes for training by 4-8X, saving power and cost of training while meeting our target training performance.
Published: 2023
Full Text: View/download PDF

2. DHEN: A Deep and Hierarchical Ensemble Network for Large-Scale Click-Through Rate Prediction

Author: Zhang, Buyun, Luo, Liang, Liu, Xi, Li, Jay, Chen, Zeliang, Zhang, Weilin, Wei, Xiaohan, Hao, Yuchen, Tsang, Michael, Wang, Wenjun, Liu, Yang, Li, Huayu, Badr, Yasmine, Park, Jongsoo, Yang, Jiyan, Mudigere, Dheevatsa, and Wen, Ellie
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Information Retrieval (cs.IR), Computer Science - Information Retrieval, Machine Learning (cs.LG)
Abstract: Learning feature interactions is important to the model performance of online advertising services. As a result, extensive efforts have been devoted to designing effective architectures to learn feature interactions. However, we observe that the practical performance of those designs can vary from dataset to dataset, even when the order of interactions claimed to be captured is the same. That indicates different designs may have different advantages and the interactions captured by them have non-overlapping information. Motivated by this observation, we propose DHEN - a deep and hierarchical ensemble architecture that can leverage strengths of heterogeneous interaction modules and learn a hierarchy of the interactions under different orders. To overcome the challenge brought by DHEN's deeper and multi-layer structure in training, we propose a novel co-designed training system that can further improve the training efficiency of DHEN. Experiments of DHEN on large-scale dataset from CTR prediction tasks attained 0.27\% improvement on the Normalized Entropy (NE) of prediction and 1.2x better training throughput than state-of-the-art baseline, demonstrating their effectiveness in practice.
Published: 2022
Full Text: View/download PDF

3. Low-Precision Hardware Architectures Meet Recommendation Model Inference at Scale

Author: Zhaoxia, Deng, Park, Jongsoo, Tang, Ping Tak Peter, Liu, Haixin, Jie, Yang, Yuen, Hector, Huang, Jianyu, Khudia, Daya, Wei, Xiaohan, Wen, Ellie, Choudhary, Dhruv, Krishnamoorthi, Raghuraman, Wu, Carole-Jean, Nadathur, Satish, Kim, Changkyu, Naumov, Maxim, Naghshineh, Sam, and Smelyanskiy, Mikhail
Subjects: Performance (cs.PF), FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Performance, Hardware Architecture (cs.AR), FOS: Mathematics, Mathematics - Numerical Analysis, Numerical Analysis (math.NA), Computer Science - Hardware Architecture, Information Retrieval (cs.IR), Computer Science - Information Retrieval, Machine Learning (cs.LG)
Abstract: Tremendous success of machine learning (ML) and the unabated growth in ML model complexity motivated many ML-specific designs in both CPU and accelerator architectures to speed up the model inference. While these architectures are diverse, highly optimized low-precision arithmetic is a component shared by most. Impressive compute throughputs are indeed often exhibited by these architectures on benchmark ML models. Nevertheless, production models such as recommendation systems important to Facebook's personalization services are demanding and complex: These systems must serve billions of users per month responsively with low latency while maintaining high prediction accuracy, notwithstanding computations with many tens of billions parameters per inference. Do these low-precision architectures work well with our production recommendation systems? They do. But not without significant effort. We share in this paper our search strategies to adapt reference recommendation models to low-precision hardware, our optimization of low-precision compute kernels, and the design and development of tool chain so as to maintain our models' accuracy throughout their lifespan during which topic trends and users' interests inevitably evolve. Practicing these low-precision technologies helped us save datacenter capacities while deploying models with up to 5X complexity that would otherwise not be deployed on traditional general-purpose CPUs. We believe these lessons from the trenches promote better co-design between hardware architecture and software engineering and advance the state of the art of ML in industry.
Published: 2021
Full Text: View/download PDF

4. First-Generation Inference Accelerator Deployment at Facebook

Author: Anderson, Michael, Chen, Benny, Chen, Stephen, Deng, Summer, Fix, Jordan, Gschwind, Michael, Kalaiah, Aravind, Kim, Changkyu, Lee, Jaewon, Liang, Jason, Liu, Haixin, Lu, Yinghai, Montgomery, Jack, Moorthy, Arun, Nadathur, Satish, Naghshineh, Sam, Nayak, Avinash, Park, Jongsoo, Petersen, Chris, Schatz, Martin, Sundaram, Narayanan, Tang, Bangsheng, Tang, Peter, Yang, Amy, Yu, Jiecao, Yuen, Hector, Zhang, Ying, Anbudurai, Aravind, Balan, Vandana, Bojja, Harsha, Boyd, Joe, Breitbach, Matthew, Caldato, Claudio, Calvo, Anna, Catron, Garret, Chandwani, Sneh, Christeas, Panos, Cottel, Brad, Coutinho, Brian, Dalli, Arun, Dhanotia, Abhishek, Duncan, Oniel, Dzhabarov, Roman, Elmir, Simon, Fu, Chunli, Fu, Wenyin, Fulthorp, Michael, Gangidi, Adi, Gibson, Nick, Gordon, Sean, Hernandez, Beatriz Padilla, Ho, Daniel, Huang, Yu-Cheng, Johansson, Olof, Juluri, Shishir, Kanaujia, Shobhit, Kesarkar, Manali, Killinger, Jonathan, Kim, Ben, Kulkarni, Rohan, Lele, Meghan, Li, Huayu, Li, Huamin, Li, Yueming, Liu, Cynthia, Liu, Jerry, Maher, Bert, Mallipedi, Chandra, Mangla, Seema, Matam, Kiran Kumar, Mehta, Jubin, Mehta, Shobhit, Mitchell, Christopher, Muthiah, Bharath, Nagarkatte, Nitin, Narasimha, Ashwin, Nguyen, Bernard, Ortiz, Thiara, Padmanabha, Soumya, Pan, Deng, Poojary, Ashwin, Raginel, Olivier, Rajagopal, Dwarak, Rice, Tristan, Ross, Craig, Rotem, Nadav, Russ, Scott, Shah, Kushal, Shan, Baohua, Shen, Hao, Shetty, Pavan, Skandakumaran, Krish, Srinivasan, Kutta, Sumbaly, Roshan, Tauberg, Michael, Tzur, Mor, Verma, Sidharth, Wang, Hao, Wang, Man, Wei, Ben, Xia, Alex, Xu, Chenyu, Yang, Martin, Zhang, Kai, Zhang, Ruoxi, Zhao, Ming, Zhao, Whitney, Zhu, Rui, Mathews, Ajit, Qiao, Lin, Smelyanskiy, Misha, Jia, Bill, and Rao, Vijay
Subjects: FOS: Computer and information sciences, Hardware Architecture (cs.AR), Computer Science - Hardware Architecture
Abstract: In this paper, we provide a deep dive into the deployment of inference accelerators at Facebook. Many of our ML workloads have unique characteristics, such as sparse memory accesses, large model sizes, as well as high compute, memory and network bandwidth requirements. We co-designed a high-performance, energy-efficient inference accelerator platform based on these requirements. We describe the inference accelerator platform ecosystem we developed and deployed at Facebook: both hardware, through Open Compute Platform (OCP), and software framework and tooling, through Pytorch/Caffe2/Glow. A characteristic of this ecosystem from the start is its openness to enable a variety of AI accelerators from different vendors. This platform, with six low-power accelerator cards alongside a single-socket host CPU, allows us to serve models of high complexity that cannot be easily or efficiently run on CPUs. We describe various performance optimizations, at both platform and accelerator level, which enables this platform to serve production traffic at Facebook. We also share deployment challenges, lessons learned during performance optimization, as well as provide guidance for future inference hardware co-design.
Published: 2021
Full Text: View/download PDF

5. Adaptive Dense-to-Sparse Paradigm for Pruning Online Recommendation System with Non-Stationary Data

Author: Ye, Mao, Choudhary, Dhruv, Yu, Jiecao, Wen, Ellie, Chen, Zeliang, Yang, Jiyan, Park, Jongsoo, Liu, Qiang, and Kejariwal, Arun
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Machine Learning (cs.LG)
Abstract: Large scale deep learning provides a tremendous opportunity to improve the quality of content recommendation systems by employing both wider and deeper models, but this comes at great infrastructural cost and carbon footprint in modern data centers. Pruning is an effective technique that reduces both memory and compute demand for model inference. However, pruning for online recommendation systems is challenging due to the continuous data distribution shift (a.k.a non-stationary data). Although incremental training on the full model is able to adapt to the non-stationary data, directly applying it on the pruned model leads to accuracy loss. This is because the sparsity pattern after pruning requires adjustment to learn new patterns. To the best of our knowledge, this is the first work to provide in-depth analysis and discussion of applying pruning to online recommendation systems with non-stationary data distribution. Overall, this work makes the following contributions: 1) We present an adaptive dense to sparse paradigm equipped with a novel pruning algorithm for pruning a large scale recommendation system with non-stationary data distribution; 2) We design the pruning algorithm to automatically learn the sparsity across layers to avoid repeating hand-tuning, which is critical for pruning the heterogeneous architectures of recommendation systems trained with non-stationary data.
Published: 2020
Full Text: View/download PDF

6. Deep Learning Recommendation Model for Personalization and Recommendation Systems

Author: Naumov, Maxim, Mudigere, Dheevatsa, Shi, Hao-Jun Michael, Huang, Jianyu, Sundaraman, Narayanan, Park, Jongsoo, Wang, Xiaodong, Gupta, Udit, Wu, Carole-Jean, Azzolini, Alisson G., Dzhulgakov, Dmytro, Mallevich, Andrey, Cherniavskii, Ilia, Lu, Yinghai, Krishnamoorthi, Raghuraman, Yu, Ansha, Kondratenko, Volodymyr, Pereira, Stephanie, Chen, Xianjie, Chen, Wenlin, Rao, Vijay, Jia, Bill, Xiong, Liang, and Smelyanskiy, Misha
Subjects: H.3.4, FOS: Computer and information sciences, Computer Science - Machine Learning, I.5.0, I.2.6, H.3.3, Information Retrieval (cs.IR), Computer Science - Information Retrieval, Machine Learning (cs.LG), 68T05
Abstract: With the advent of deep learning, neural network-based recommendation models have emerged as an important tool for tackling personalization and recommendation tasks. These networks differ significantly from other deep learning networks due to their need to handle categorical features and are not well studied or understood. In this paper, we develop a state-of-the-art deep learning recommendation model (DLRM) and provide its implementation in both PyTorch and Caffe2 frameworks. In addition, we design a specialized parallelization scheme utilizing model parallelism on the embedding tables to mitigate memory constraints while exploiting data parallelism to scale-out compute from the fully-connected layers. We compare DLRM against existing recommendation models and characterize its performance on the Big Basin AI platform, demonstrating its usefulness as a benchmark for future algorithmic experimentation and system co-design., Comment: 10 pages, 6 figures
Published: 2019
Full Text: View/download PDF

7. Spatial-Winograd Pruning Enabling Sparse Winograd Convolution

Author: Yu, Jiecao, Park, Jongsoo, and Naumov, Maxim
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Computer Science - Neural and Evolutionary Computing, Neural and Evolutionary Computing (cs.NE), Machine Learning (cs.LG)
Abstract: Deep convolutional neural networks (CNNs) are deployed in various applications but demand immense computational requirements. Pruning techniques and Winograd convolution are two typical methods to reduce the CNN computation. However, they cannot be directly combined because Winograd transformation fills in the sparsity resulting from pruning. Li et al. (2017) propose sparse Winograd convolution in which weights are directly pruned in the Winograd domain, but this technique is not very practical because Winograd-domain retraining requires low learning rates and hence significantly longer training time. Besides, Liu et al. (2018) move the ReLU function into the Winograd domain, which can help increase the weight sparsity but requires changes in the network structure. To achieve a high Winograd-domain weight sparsity without changing network structures, we propose a new pruning method, spatial-Winograd pruning. As the first step, spatial-domain weights are pruned in a structured way, which efficiently transfers the spatial-domain sparsity into the Winograd domain and avoids Winograd-domain retraining. For the next step, we also perform pruning and retraining directly in the Winograd domain but propose to use an importance factor matrix to adjust weight importance and weight gradients. This adjustment makes it possible to effectively retrain the pruned Winograd-domain network without changing the network structure. For the three models on the datasets of CIFAR10, CIFAR-100, and ImageNet, our proposed method can achieve the Winograd domain sparsities of 63%, 50%, and 74%, respectively.
Published: 2019
Full Text: View/download PDF

8. Glow: Graph Lowering Compiler Techniques for Neural Networks

Author: Rotem, Nadav, Fix, Jordan, Abdulrasool, Saleem, Catron, Garret, Deng, Summer, Dzhabarov, Roman, Gibson, Nick, Hegeman, James, Lele, Meghan, Levenstein, Roman, Montgomery, Jack, Maher, Bert, Nadathur, Satish, Olesen, Jakob, Park, Jongsoo, Rakhov, Artem, Smelyanskiy, Misha, and Wang, Man
Subjects: FOS: Computer and information sciences, Computer Science - Programming Languages, Programming Languages (cs.PL)
Abstract: This paper presents the design of Glow, a machine learning compiler for heterogeneous hardware. It is a pragmatic approach to compilation that enables the generation of highly optimized code for multiple targets. Glow lowers the traditional neural network dataflow graph into a two-phase strongly-typed intermediate representation. The high-level intermediate representation allows the optimizer to perform domain-specific optimizations. The lower-level instruction-based address-only intermediate representation allows the compiler to perform memory-related optimizations, such as instruction scheduling, static memory allocation and copy elimination. At the lowest level, the optimizer performs machine-specific code generation to take advantage of specialized hardware features. Glow features a lowering phase which enables the compiler to support a high number of input operators as well as a large number of hardware targets by eliminating the need to implement all operators on all targets. The lowering phase is designed to reduce the input space and allow new hardware backends to focus on a small number of linear algebra primitives.
Published: 2018
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

8 results on '"Park, Jongsoo"'

1. MTrainS: Improving DLRM training efficiency using heterogeneous memories

2. DHEN: A Deep and Hierarchical Ensemble Network for Large-Scale Click-Through Rate Prediction

3. Low-Precision Hardware Architectures Meet Recommendation Model Inference at Scale

4. First-Generation Inference Accelerator Deployment at Facebook

5. Adaptive Dense-to-Sparse Paradigm for Pruning Online Recommendation System with Non-Stationary Data

6. Deep Learning Recommendation Model for Personalization and Recommendation Systems

7. Spatial-Winograd Pruning Enabling Sparse Winograd Convolution

8. Glow: Graph Lowering Compiler Techniques for Neural Networks

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Database

8 results on '"Park, Jongsoo"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources