High-precision recognition is limited to multiple occlusions and small sizes of camellia oleifera fruits in natural environments. In this study, COF-YOLOv5s was proposed to accurately and rapidly locate the camellia oleifera fruits using YOLOv5s. Three aspects were used to improve the model. Specifically, a small target detection layer was first added. The lightweight module Faster Block from FasterNet was then embedded into the C3 module. Biformer attention mechanism was finally added. Experimental results show that only Faster-C3 to YOLOv5s increased the mAP, R and P by 1.8, 5.5 and 1.6 percentage points, respectively, compared with the original YOLOv5s,inference time decreased by 0.5 percentage points and 3.6 ms, indicating that the Faster-C3 was balanced the detection accuracy and speed. The small target detection layer significantly improved the mAP, R, and P, which increased by 1.8, 4.2, and 3.2 percentage points, respectively, compared with the original one. There was an increase in the inference time of 2.3 ms. After that, Faster-C3 was incorporated into the network with Biformer. The small target detection layer reduced both the inference time and parameter count. FasterBlock embedded into C3 mitigated the increase in the parameter count and memory access, due to the addition of the attention mechanism and small target detection layer. After all three were incorporated into the network, the mAP, R, and P increased by 4.4, 7.5, and 5.0 percentage points, respectively, compared with the original network. The highest increase was observed in R. Therefore, the network reduced the miss rate, and the inference time was only 1.8 ms longer than that of the original ones, indicating the effectiveness of this model. The improved network was achieved in P, R, and mAP of 97.6%, 97.8%, and 99.1%, respectively, on the test set, which were 5.0, 7.5, and 4.4 percentage points higher than before. The inference time was 10.3 ms, and the model weight file was only 16.1 MB. Finally, the improved model was deployed on the Jetson Xavier NX, and then combined with the ZED mini camera. The identification and positioning experiments were carried out on the camellia oleifera fruits. The recall rate of COF-YOLOv5s was 91.7% in indoor experiments, which was 47.3 percentage points higher than before. The recall rate of green camellia oleifera fruits was 68.8% in outdoor experiments. Furthermore, the recall rate was 64.3% for the small red camellia oleifera fruits under weak light conditions. The feasible theoretical support was provided to upgrade the agricultural equipment, in order to realize the intelligence and scale of the crop industry. Both indoor and outdoor experiments showed that there was some deviation in the detection on the test set. The main reason was that the camera was close to the target with a distance of about 0.2-0.4 m, resulting in the captured images being close-up shots. By contrast, the camera was about 1.2 m away from the fruit in the indoor/outdoor harvesting, which was equivalent to a long-shot picture. Identification errors then resulted in lower recall rates in indoor/outdoor experiments, compared with the test set. [ABSTRACT FROM AUTHOR]