Identification of plant disease has been becoming a significant issue on the pattern recognition and image processing in intelligent agriculture. However, the conventional single feature cannot clarify the typical characteristics of crop diseases, due to the various parameters of growth environment, including the soil temperature and humidity, pH value, air temperature and humidity, all are closely related to the plant disease. Recently, the vectorization of multi-structured data learning, and the optimal combination of features can provide a new way to effectively improve the accuracy of disease diagnosis. Taking 50 samples and four types of cucumber diseases as examples, including powdery mildew, fusarium wilt, keratoderma, and sclerotinia sclerotiorum, this study aims to establish an optimal combination model for multi structure of plant disease, using integrated learning of in situ environmental parameters via intelligent Internet of Things. The specific collected data can be 4332 of powdery mildew, 4213 of keratoderma, 234 of anthracnose, and 2895 of fusarium wilt. Each kind of disease was corresponding to several descriptions of expert knowledge, which were combined with the number of expert knowledge description for each disease. Based on the perception model of multimedia sensor networks, a multi-path packet transmission method was proposed to ensure the reliability of multi-structure data transmission. The collected multi-source heterogeneous data was grouped and transmitted along multiple paths, in order to reduce data loss and transmission time, while improve the accuracy of data acquisition. The heterogeneous modal information was then mapped on the shared subspace, while the similarity of heterogeneous data was directly measured under a framework. Furthermore, the relationship of unknown heterogeneous data was derived using the integrated index system of multi structure parameters. A learning vector quantization neural network algorithm was used for the multi-fusion category by fusing the structural environmental parameters of crop growth, and unstructured image features. Three types of data features in the input layer were fused to establish semantic association, including the real-time environment Internet of things data, crop disease image data and expert knowledge text data. A method of multi-structure parameter ensemble learning was used to diagnose disease types, where the recognition rate of samples was 79.4% to 93.6%. Specifically, the recognition rate of powdery mildew was relatively high, due to its obvious image features and clear relationship between environment and disease. The identification rate of fusarium wilt was lower than that of other diseases, because of the similar disease characteristics and anthracnose image features, with emphasis on the expert knowledge description of leaf spot. In order to verify the robustness of the proposed algorithm, a convolutional neural network and simple image recognition technology based on deep transfer learning were selected to carry out experimental analysis on the above-mentioned four typical cucumber diseases. The experimental results show that in the convolution neural network image recognition method, the recognition rate was similar to that of the proposed method, but the recognition time of background was higher than that of the method, due possibly to reduce the dimension of disease image data. Normally, the deep transfer learning method requires many image data inputs into the network for learning, whereas, the actual number of disease images cannot be enough to meet the requirements of deep learning. That is why the recognition rate can be reduced due to the insufficient samples. The intelligent diagnosis technology of cucumber diseases was established based on multi-structural parameters ensemble learning, and thereby to serve as a sound basis for the correlation analysis between image features and environment parameters. Combined with the environmental and expert knowledge resources, the subspace mapping was used to deal with the heterogeneity of different modal data, further to ensure the accuracy of identification based on less identification time. [ABSTRACT FROM AUTHOR]