Monochamus alternatus and Arhopalus rusticus are two important trunk-destroying pests on pine trees. Timely acquisition of their changing trends is required to precisely prevent and control of insect pests in pine forests. In this study, a remote intelligent monitoring system was constructed using machine vision, including the trapping module, the beetle detection, and the system web end. The trapping module was usually placed in the key areas of pine forests to capture the longicorn beetles, and then the images of the beetles were timely collected by cameras. The lightweight detection model (GMWYOLOv5s) was deployed to recognize and count the longhorn beetles at the edge using the deep learning YOLOv5s model. The detection data was presented on the web end via the wireless transmission for the better traceability of beetles’ distribution. The improved YOLOv5s model was used in the detection module to detect the different categories of longhorn beetles. The specific procedures were as follows. Firstly, the Ghost module was selected as the YOLOv5s backbone network to reduce the number of model parameters, and then a lightweight network was constructed. Secondly, the multi-scale detection mechanism was introduced into the neck network for the dependency relationship between the deep semantics and the shallow semantics multiscale detection information. The feature layer weights of the shallow network were benefited to increase the detection capability of the tiny targets. Finally, the regression loss function of WIoU (wise intersection over union) bounding box was introduced to optimize the target for the high localization accuracy of longhorn beetles. The experimental results show that the better performance was achieved in the intelligent monitoring system. Ghost module was introduced into the detection module to reduce the model size by 6.9 M and the parameter number by 47.6%. The multi-scale detection was improved the precision and recall by 0.7% and 0.4%, respectively. The mAP0.5 increased to 96.4% with the introduction of WIoU loss function. The GMW-YOLOv5s model was integrated with three improvement methods to detect the longhorn beetles. The precision and recall rate of the improved model were 94.4% and 93.6%, respectively. The mean average precision reached 96.2% under IoU=0.5 (mAP0.5), which was higher than that of the original. The improved model shared the better image feature learning and object detection, indicating the superior performance in the detection of small objects. The comparison was also implemented with many mainstream target detection models, including Faster-RCNN, SSD, YOLOX, YOLOv7-tiny, YOLOv5n, and YOLOv5s. The indicators of GMW-YOLOv5s model were improved, such as the precision, recall and mAP0.5. An accurate detection was achieved in the position of longhorn beetles, especially in the densely distributed scenes. In addition, the improved model was used to effectively distinguish some targets, and then avoid missed the detections. A single inference time of 1.4 s and a model size of 9.3 M were suitable for the deployment on the edge devices with limited computational resources, which was fully met the accuracy and intelligence requirements of the longicorn beetle detection task. Furthermore, the mean average precision and recall were 99.0%, and 98.7% in the two types of longicorn beetles counting respectively. The model and manual counts were relatively close to each other. There was the low number of omissions and misdetections, indicating the less error in counting the longicorn beetles of the image. The monitoring system can be expected to regularly and automatically collect the beetle images on a regular basis, and then accurately detect and count them at the edge end. The trend of the number of longhorn beetles can also be viewed through the system web interface. Therefore, it is of great significance for the remote intelligent monitoring of longhorn beetles in the trap scenarios, particularly for the higher intelligent level of forest pest control. [ABSTRACT FROM AUTHOR]