1. 9.9 A Background-Noise and Process-Variation-Tolerant 109nW Acoustic Feature Extractor Based on Spike-Domain Divisive-Energy Normalization for an Always-On Keyword Spotting Device
- Author
-
Sung Justin Kim, Aurel A. Lazar, Minhao Yang, Dewei Wang, and Mingoo Seok
- Subjects
Normalization (statistics) ,Spiking neural network ,Background noise ,Noise ,Signal-to-noise ratio ,Robustness (computer science) ,business.industry ,Computer science ,Keyword spotting ,Pattern recognition ,Artificial intelligence ,business ,Energy (signal processing) - Abstract
In mobile and edge devices, always-on keyword spotting (KWS) is an essential function to detect wake-up words. Recent works achieved extremely low power dissipation down to $\sim500$ nW [1]. However, most of them adopt noise-dependent training, i.e. training for a specific signal-to-noise ratio (SNR) and noise type [1], and therefore their accuracies degrade for different SNR levels and noise types that are not targeted in the training (Fig. 9.9.1, top left). To improve robustness, so-called noise-independent training can be considered, which is to use the training data that includes all the possible SNR levels and noise types [2]. But, this approach is challenging for an ultra-low-power device since it demands a large neural network to learn all the possible features. A neural network of a fixed size has its own memory capacity limit and reaches a plateau in accuracy if it has to learn more than its limit (Fig. 9.9.1, top right). On the other hand, it is known that biological acoustic systems employ a simpler process, called divisive energy normalization (DN), to maintain accuracy even in varying noise conditions [3]. In this work, therefore, by adopting such a DN, we prototype a normalized acoustic feature extractor chip (NAFE) in 65nm. The NAFE can take an acoustic signal from a microphone and produce spike-rate coded features. We pair NAFE with a spiking neural network (SNN) classifier chip [4], creating the end-to-end KWS system. The proposed system achieves 89-to-94% accuracy across -5 to 20dB SNRs and four different noise types on HeySnips [5], while the baseline without DN achieves a much lower accuracy of 71-87%. NAFE consumes up to 109nW and the KWS system 570nW.
- Published
- 2021
- Full Text
- View/download PDF