The navel orange planting area in Gannan region of China has reached the first in the world, with an annual output of one million tons. However, most navel oranges were picked manually with high labor intensity and low efficiency. It was very urgent to introduce intelligent picking robots. In the process of navel orange picking, the picking robot needed to obtain the spatial position and fruit size information of navel orange fruit in real time to achieve efficient and rapid intelligent grading picking. This study proposed an algorithm framework--OrangePointSge based on the improved YOLACT model and the least-square sphere fitting method. The real-time instance segmentation algorithm YOLACT generated an instance mask and cropped the registered depth image to obtain the fruit depth point cloud. Then, the least-square method was used to fit the navel orange fruit shape, and the centroid coordinates and radius under the camera coordinate system were obtained. Firstly, the RGB-D data of navel orange fruit was collected by Microsoft's latest consumer-level depth camera (Azure Kinect DK), and the navel orange fruit instance segmentation dataset and enhancement dataset were established and free to open source. The dataset contained 2 178 images, including 8 712 samples. Among those samples, the number of non-occluded fruit was 4 682, and the number of slightly occluded fruit was 4 030. Then the YOLACT algorithm was improved to adapt to the detection of navel orange fruit, and the instance mask of the fruit was output. We modified the original ResNet (deep residual network) backbone network to HRNet (High-Resolution Net) to simplify the model, and used the hrnet_w48 structure for feature extraction to improve the model detection accuracy. We also optimized the non-maximum suppression process to improve the detection speed. At the same time, several groups of comparative tests were also set up, including the performance of YOLACT with different backbone networks and the performance of different algorithms in different scenarios. To improve the generalization ability of the training model, we carried out a series of enhancement algorithms for color images, mainly including scale change, color change, brightness change, and adding noise. For long-range small targets, we used oversampling method for enhancement training to improve their detection accuracy. Therefore, the recognition speed and mask accuracy of navel orange were higher than other algorithms. The detection speed was 44.63 frames/s and the average accuracy was 31.15%. The mask generated by improved YOLACT will be cropped with the depth image registered with the color image to obtain the depth information of each fruit to generate the fruit depth point cloud from the depth image. The least-square method was used to fit the point cloud to obtain the centroid coordinates and radius of the fruit, which were used to guide the robot to carry out grading picking. In addition, we added the comparison with RANSAC (Random Sample Consensus) algorithm. The results show that the least-square method is faster than RANSAC while ensuring the appropriate accuracy. When the number of point clouds was 1 400-2 000, the fitting time was 1.99 ms, the positioning error was 0.49 cm, the fitting radius root mean square error was 0.43 cm, and the volume root mean square error was 52.6 mL. Through the experiment of setting different distances on the fitting accuracy, it was found that when the number of point clouds was more than 800 and the distance was less than 1.0 m, the positioning error was controlled within 0.46 cm. Finally, by introducing parallel computing and combining the above two processing flows, the overall processing speed of OrangePointSge was 29.4 frames/s, which can better balance the accuracy and speed. Thus, the proposed algorithm is conducive to practical application and engineering deployment. [ABSTRACT FROM AUTHOR]