Enhancing Model Parallelism in Neural Architecture Search for Multidevice System.

Authors :: Fu, Cheng
Chen, Huili
Yang, Zhenheng
Koushanfar, Farinaz
Tian, Yuandong
Zhao, Jishen
Source :: IEEE Micro. Sep/Oct2020, Vol. 40 Issue 5, p46-55. 10p.
Publication Year :: 2020
Abstract: Neural architecture search (NAS) finds favorable network topologies for better task performance. Existing hardware-aware NAS techniques only target to reduce inference latency on single CPU/GPU systems and the searched model can hardly be parallelized. To address this issue, we propose ColocNAS, the first synchronization-aware, end-to-end NAS framework that automates the design of parallelizable neural networks for multidevice systems while maintaining a high task accuracy. ColocNAS defines a new search space with elaborated connectivity to reduce device communication and synchronization. ColocNAS consists of three phases: 1) offline latency profiling that constructs a lookup table of inference latency of various networks for online runtime approximation; 2) differentiable latency-aware NAS that simultaneously minimizes inference latency and task error; and 3) reinforcement-learning-based device placement fine-tuning to further reduce the latency of the deployed model. Extensive evaluation corroborates ColocNAS's effectiveness to reduce inference latency while preserving task accuracy. [ABSTRACT FROM AUTHOR]

Subjects :: *REINFORCEMENT learning
*TASK performance
*COMPUTER architecture
*GRAPHICS processing units
*TASK analysis
*SYNCHRONIZATION

Full Text Access

Tools