To overcome the fault of convolutional networks, which can be over-smooth, blurred, or discontinuous, a novel transformer network with cross-window aggregated attention is proposed. Our network as a whole is constructed as a generative adversarial network model, and by embedding the Window Aggregation Transformer (WAT) module, we improve the information aggregation between windows without increasing the computational complexity and effectively obtain the image long-range dependencies to solve the problem that convolutional operations are limited by local feature extraction. First, the encoder extracts the multi-scale features of the image with convolution kernels of different scales; second, the feature maps of different scales are input into a WAT module to realize the aggregation between feature information and finally, these features are reconstructed by the decoder, and then, the generated image is input into the global discriminator, in which the discrimination between real and fake images is completed. It is experimentally verified that our designed Transformer window attention network is able to make the structured texture of the restored images richer and more natural when performing the restoration task of large broken or structurally complex images. [ABSTRACT FROM AUTHOR]