Heterologous image fusion has been widely used to integrate multiple images into one. The fused images also present a higher definition, more significant edge intensity, and more information than the source image. There are different characteristics of image data collected by the various types of sensors. Among them, the depth sensor imaging used the Time of Flight (ToF) to realize the distance calculation using the ToF near-infrared light. The beneficial supplement has been commonly used for the visible light camera. The broad application can also be expected in the agriculture, medical treatment, quality inspection, and vision fields. However, the image acquisition of a single natural scene cannot fully meet the requirements for rapid and accurate identification of the fruits and positioning targets. The image fusion can be extended to the heterologous vision system using multi-objective optimization, particularly in the field of natural scenes. In this study, a multi-scale decomposition and dual optimization strategy was proposed to simplify the ToF and visible-light image fusion in an apple orchard using the Simplified Pulse Coupled Neural Network (SPCNN). A double strategy with parameter optimization was introduced into the SPCNN model for the fusion of Nonsubsampled Contourlet Transform (NSCT). The model included the registration module, coding area, multi-scale decomposition module, single target SPCNN fusion model, multi-target SPCNN fusion model, and decoding area. The heterologous vision system was also used to accurately register the ToF and visible light images. Four parameters of SPCNN model were encoded, including the link channel feedback term, link strength, dynamic threshold attenuation factor, and dynamic threshold amplification factor. The NSCT was used to decompose the image at multiple scales. The fusion rules in the SPCNN model were adopted with the improved artificial bee colony algorithm and double optimization, including the single- and multi-objective parameter optimization. Each binary vector was converted into the real parameters using the decoding area. The objective function of the double optimization was used as the iteration termination of the SPCNN model. Finally, the heterogeneous image fusion was implemented after the multi-scale inverse transformation. There was improved parameter optimization and iteration times of SPCNN model. The adaptive ignition times of the model were relatively low (about 3-7 times), indicating low ignition times, adaptive segmentation, and high efficiency. The success rate of ignition recognition reached 100.00%, and the minimum duration of ignition division reached 91.91 s at 15:00. Specifically, the success rate of fusion image recognition also reached 100.00% under different periods, including strong, medium, and weak light at 12:00, 15:00, 18:20, and 19:00, compared with the rest fusion models. The fusion time was much lower than that of SPCNN model, with a minimum of 92.68 s. The four fusion indexes were the largest in the weak light period of 18:20, including the average gradient, correlation coefficient, mutual information, entropy, and spatial frequency. The proposed model presented an excellent performance in accuracy, time-consuming, and model size. The finding can provide a supplement to the image hierarchical fusion. [ABSTRACT FROM AUTHOR]