ObjectiveWine grapes are severely affected by leafroll disease, which affects their growth, and reduces the quality of the color, taste, and flavor of wine. Timely and accurate diagnosis of leafroll disease severity is crucial for preventing and controlling the disease, improving the wine grape fruit quality and wine-making potential. Unmanned aerial vehicle (UAV) remote sensing technology provides high-resolution images of wine grape vineyards, which can capture the features of grapevine canopies with different levels of leafroll disease severity. Deep learning networks extract complex and high-level features from UAV remote sensing images and perform fine-grained classification of leafroll disease infection severity. However, the diagnosis of leafroll disease severity is challenging due to the imbalanced data distribution of different infection levels and categories in UAV remote sensing images.MethodA novel method for diagnosing leafroll disease severity was developed at a canopy scale using UAV remote sensing technology and deep learning. The main challenge of this task was the imbalanced data distribution of different infection levels and categories in UAV remote sensing images. To address this challenge, a method that combined deep learning fine-grained classification and generative adversarial networks (GANs) was proposed. In the first stage, the GANformer, a Transformer-based GAN model was used, to generate diverse and realistic virtual canopy images of grapevines with different levels of leafroll disease severity. To further analyze the image generation effect of GANformer. The t-distributed stochastic neighbor embedding (t-SNE) to visualize the learned features of real and simulated images. In the second stage, the CA-Swin Transformer, an improved image classification model based on the Swin Transformer and channel attention mechanism was used, to classify the patch images into different classes of leafroll disease infection severity. CA-Swin Transformer could also use a self-attention mechanism to capture the long-range dependencies of image patches and enhance the feature representation of the Swin Transformer model by adding a channel attention mechanism after each Transformer layer. The channel attention (CA) mechanism consisted of two fully connected layers and an activation function, which could extract correlations between different channels and amplify the informative features. The ArcFace loss function and instance normalization layer was also used to enhance the fine-grained feature extraction and downsampling ability for grapevine canopy images. The UAV images of wine grape vineyards were collected and processed into orthomosaic images. They labeled into three categories: healthy, moderate infection, and severe infection using the in-field survey data. A sliding window method was used to extract patch images and labels from orthomosaic images for training and testing. The performance of the improved method was compared with the baseline model using different loss functions and normalization methods. The distribution of leafroll disease severity was mapped in vineyards using the trained CA-Swin Transformer model.Results and DiscussionsThe experimental results showed that the GANformer could generate high-quality virtual canopy images of grapevines with an FID score of 93.20. The images generated by GANformer were visually very similar to real images and could produce images with different levels of leafroll disease severity. The T-SNE visualization showed that the features of real and simulated images were well clustered and separated in two-dimensional space, indicating that GANformer learned meaningful and diverse features, which enriched the image dataset. Compared to CNN-based deep learning models, Transformer-based deep learning models had more advantages in diagnosing leafroll disease infection. Swin Transformer achieved an optimal accuracy of 83.97% on the enhanced dataset, which was higher than other models such as GoogLeNet, MobileNetV2, NasNet Mobile, ResNet18, ResNet50, CVT, and T2TViT. It was found that replacing the cross entropy loss function with the ArcFace loss function improved the classification accuracy by 1.50%, and applying instance normalization instead of layer normalization further improved the accuracy by 0.30%. Moreover, the proposed channel attention mechanism, named CA-Swin Transformer, enhanced the feature representation of the Swin Transformer model, achieved the highest classification accuracy on the test set, reaching 86.65%, which was 6.54% higher than using the Swin Transformer on the original test dataset. By creating a distribution map of leafroll disease severity in vineyards, it was found that there was a certain correlation between leafroll disease severity and grape rows. Areas with a larger number of severe leafroll diseases caused by Cabernet Sauvignon were more prone to have missing or weak plants.ConclusionsA novel method for diagnosing grapevine leafroll disease severity at a canopy scale using UAV remote sensing technology and deep learning was proposed. This method can generate diverse and realistic virtual canopy images of grapevines with different levels of leafroll disease severity using GANformer, and classify them into different classes using CA-Swin Transformer. This method can also map the distribution of leafroll disease severity in vineyards using a sliding window method, and provides a new approach for crop disease monitoring based on UAV remote sensing technology.