Aiming at the problems that face images in uncontrollable environments are susceptible to complex factors such as illumination and pose changes, which in turn cause low face detection rate and poor expression recognition accuracy in face expression recognition, an expression recognition method with an embedded attention mechanism residual network is proposed. In the stage of face detection, the improved RetinaFace algorithm is used to complete multi-view face detection and obtain the face region. In the stage of feature extraction, ResNet-50 is used as the backbone network for feature extraction. Firstly, the pre- processed face images are sequentially passed through the channel attention network and spatial attention network of this model to explicitly model the global image interdependence. Secondly, in the shortcut connection of the dashed residual cells, an average ensemble layer is added for the downsampling operation. By fine-tuning the operation of the residual module, the mapping between the input features is enhanced, so that the extracted expression features can be passed between the networks more completely, so as to reduce the loss of feature information. Finally, the convolutional block attention module (CBAM) attention mechanism module is passed into the network again to enhance the channel dimension information and spatial dimension information of local expression features, strengthen the focus information of feature regions with high relevance to expressions in the feature map, and suppress the interference of irrelevant regions in the feature map, thus speeding up the convergence speed of the network and improving the expression recognition rate. Compared with the baseline algorithm, this method achieves 87.65% and 73.57% accuracy on the RAF-DB and FER2013 expression datasets, respectively. [ABSTRACT FROM AUTHOR]