1. Error Resilient Machine Learning for Safety-Critical Systems: Position Paper
- Author
-
Zitao Chen, Guanpeng Li, and Karthik Pattabiraman
- Subjects
Artificial neural network ,Computer science ,Commodity hardware ,business.industry ,02 engineering and technology ,Fault injection ,Machine learning ,computer.software_genre ,020202 computer hardware & architecture ,Soft error ,Life-critical system ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Redundancy (engineering) ,Position paper ,Industrial robotics ,Artificial intelligence ,business ,computer - Abstract
Machine learning (ML) has increasingly been adopted in safety-critical systems such as autonomous vehicles (AVs) and industrial robotics. In these domains, reliability and safety are important considerations, and hence it is critical to ensure the resilience of ML systems to faults and errors. On the other hand, soft errors are becoming more frequent in commodity computer systems due to the effects of technology scaling and reduced supply voltages. Further, traditional solutions for masking hardware faults such as Triple-Modular Redundancy (TMR) are prohibitively expensive in terms of their energy and performance overheads. Therefore, there is a compelling need to ensure the resilience of ML applications to soft errors on commodity hardware platforms.We first experimentally assess the resilience of safety-critical ML applications to soft errors. We demonstrate through fault injection experiments that even a single bit flip due to a soft error can lead to misclassification in Deep Neural Network (DNN) applications deployed in AVs, leading to safety violations. However, not all the errors in an DNN will result in serve consequences such as safety violations, and hence it is sufficient to protect the DNN from the ones that do. Unfortunately, finding all possible errors that result in safety violations is a very compute intensive task. We propose BinFI, a fault injection approach that efficiently injects critical faults that are highly likely to result in safety violations, based on the unique properties of DNNs. Finally, we propose Ranger, an approach to protect DNNs from critical faults with minimal performance overheads and no accuracy loss. We will conclude by presenting some of our ongoing work, and the future challenges in this area.
- Published
- 2020