Accurate and timely winter wheat yield estimation has significant effect to grain markets and policy. Most crop estimation methods can be divided into two categories, one is based on the crop model and the other is the statistical learning method. For statistical learning methods with recent advances in deep learning, convolutional neural network (CNN) have become state-of-the-art algorithms. can extract the depth-dependent features of crop growth. However, the pivotal challenge is to combine remote sensing images with CNN. In this paper, we employ the method of histogram dimensionality reduction and time series fusion to generate the input layer. The experiment firstly performed projection transformation, splicing, mask, fusion, and clipping for 6 different MODIS images in the research area from 2006 to 2016, and then generated 21 600 fusion images of 12 bands (surface reflectance data of 7 different wavelengths in MOD09A1, surface temperature of day and night in MYD11A2, NDVI and EVI in MOD13A1, and FPAR in MOD15A2H). Then, the sensitivity range of winter wheat growth in each band is divided into 36 sections, and the histogram statistics are used to reduce the dimension to generate a vector of length 36, so the remote sensing image generates a matrix of 36×36×12 in the 228-day growing season. The corresponding time and regional statistics are applied as the output layer to construct a complete sample. The yield estimation sample database of 12 indices in the winter wheat region of north China (60 prefecture-level cities) from 2006 to 2016 was constructed, and the training set and verification set were divided into 10:1 for the training and evaluation of yield estimation model. Finally, the neural network structure is designed according to the sample, which consists of the input layer, 7 convolution layers (c1-c7), 7 activation layers, 7 batch normalization layers, 3 dropout layers, 2 full connection layers, and output layer. The number of c1-c7 convolution kernels is 64, 64, 128, 128, 256, 256, 256, the convolution kernel size is 3×3 dpi, and the sliding step length is 2, 1, 2, 2, 2, 1 and 2 respectively, 1 zero paddings per convolutional layer. At the same time, batch normalization and Relu function activation are performed on each convolutional layer, and the Dropout layer is used in the fully connected layer. The results show that: 1) The root-mean-square error (RMSE) and coefficient of determination (R²) of the convolutional neural network model on the training set are 183.82 kg/hm² and 0.98 respectively. In the validation set, RMSE and R² are 689.72 kg/hm² and 0.71. 2) With the same neural network structure, the average RMSE of the estimated samples from 2006 to 2016 trained as validation sets for 11 models was 772.03 kg/hm². The error of the yield estimation model was the largest in 2007 and the smallest in 2012, and the RMSE was 920.45 kg/hm² and 632.08 kg/hm² respectively. Crop estimation algorithm based on CNN has high robustness and precision; 3) The accuracy analysis of prediction yield at the municipal level of different provinces in three temporal points of 2007, 2012 and 2016 indicates that the model has higher accuracy in most areas of the northern winter wheat region, especially, RMSE of Hebei and Shandong provinces is approximately 500 kg/hm². The result shows that CNN is well applied to the estimation of winter wheat production. This is a great thought of remote sensing combined with the deep learning algorithm. This method can be used to estimate yield by remote sensing in different scales and regions. Compared with the traditional method, this “start-to-end” learning method has the advantage of synergy and can obtain the optimal estimation model relative to the whole area. Meanwhile, As data accumulates, the estimation accuracy will be continuously improved, and it has a good application prospect in the national agricultural production forecast. [ABSTRACT FROM AUTHOR]