This paper presents an ultra-low-power dual-mode automatic sleep staging processor design using a neural-network (NN)-based decision tree classifier to enable real-time, long-term, and flexible sleep monitoring. The ultra-low-power feature is achieved by an algorithm-hardware co-design approach that jointly considers optimization opportunities across the algorithm, architecture, and circuit levels to minimize power consumption; consequently, the first sub-10- $\mu \text{W}$ NN-based automatic sleep staging processor is realized. The dual-mode NN models are trained by an open-source large-scale dataset. The default mode achieves 81.0% classification accuracy based on two signals of one electroencephalography (EEG) signal and one electromyography (EMG) signal, and the compact mode achieves 78.5% accuracy based on only one EEG signal. In addition, the proposed design was verified using the National Taiwan University Hospital (NTUH) dataset, for which 81.1% and 77.1% accuracy is achieved in the default and the compact modes, respectively. A prototype chip using a 180-nm CMOS process occupies a total area of 11.74 mm2 and operates at 10 KHz while consuming 4.96 $\mu \text{W}$ at 1.2 V.