645 results
Search Results
2. SFA-Net: Semantic Feature Adjustment Network for Remote Sensing Image Segmentation.
- Author
-
Hwang, Gyutae, Jeong, Jiwoo, and Lee, Sang Jun
- Subjects
- *
CONVOLUTIONAL neural networks , *COMPUTER vision , *REMOTE sensing , *DEEP learning , *TRANSFORMER models - Abstract
Advances in deep learning and computer vision techniques have made impacts in the field of remote sensing, enabling efficient data analysis for applications such as land cover classification and change detection. Convolutional neural networks (CNNs) and transformer architectures have been utilized in visual perception algorithms due to their effectiveness in analyzing local features and global context. In this paper, we propose a hybrid transformer architecture that consists of a CNN-based encoder and transformer-based decoder. We propose a feature adjustment module that refines the multiscale feature maps extracted from an EfficientNet backbone network. The adjusted feature maps are integrated into the transformer-based decoder to perform the semantic segmentation of the remote sensing images. This paper refers to the proposed encoder–decoder architecture as a semantic feature adjustment network (SFA-Net). To demonstrate the effectiveness of the SFA-Net, experiments were thoroughly conducted with four public benchmark datasets, including the UAVid, ISPRS Potsdam, ISPRS Vaihingen, and LoveDA datasets. The proposed model achieved state-of-the-art accuracy on the UAVid, ISPRS Vaihingen, and LoveDA datasets for the segmentation of the remote sensing images. On the ISPRS Potsdam dataset, our method achieved comparable accuracy to the latest model while reducing the number of trainable parameters from 113.8 M to 10.7 M. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Deep-Learning-Based Daytime COT Retrieval and Prediction Method Using FY4A AGRI Data.
- Author
-
Xu, Fanming, Song, Biao, Chen, Jianhua, Guan, Runda, Zhu, Rongjie, Liu, Jiayu, and Qiu, Zhongfeng
- Subjects
- *
CONVOLUTIONAL neural networks , *PREDICTION models , *DEEP learning , *FORECASTING - Abstract
The traditional method for retrieving cloud optical thickness (COT) is carried out through a Look-Up Table (LUT). Researchers must make a series of idealized assumptions and conduct extensive observations and record features in this scenario, consuming considerable resources. The emergence of deep learning effectively addresses the shortcomings of the traditional approach. In this paper, we first propose a daytime (SOZA < 70°) COT retrieval algorithm based on FY-4A AGRI. We establish and train a Convolutional Neural Network (CNN) model for COT retrieval, CM4CR, with the CALIPSO's COT product spatially and temporally synchronized as the ground truth. Then, a deep learning method extended from video prediction models is adopted to predict COT values based on the retrieval results obtained from CM4CR. The COT prediction model (CPM) consists of an encoder, a predictor, and a decoder. On this basis, we further incorporated a time embedding module to enhance the model's ability to learn from irregular time intervals in the input COT sequence. During the training phase, we employed Charbonnier Loss and Edge Loss to enhance the model's capability to represent COT details. Experiments indicate that our CM4CR outperforms existing COT retrieval methods, with predictions showing better performance across several metrics than other benchmark prediction models. Additionally, this paper also investigates the impact of different lengths of COT input sequences and the time intervals between adjacent frames of COT on prediction performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. A Method for Underwater Acoustic Target Recognition Based on the Delay-Doppler Joint Feature.
- Author
-
Du, Libin, Wang, Zhengkai, Lv, Zhichao, Han, Dongyue, Wang, Lei, Yu, Fei, and Lan, Qing
- Subjects
- *
CONVOLUTIONAL neural networks , *ARCHITECTURAL acoustics , *OBJECT recognition (Computer vision) , *FOURIER transforms - Abstract
With the aim of solving the problem of identifying complex underwater acoustic targets using a single signal feature in the Time–Frequency (TF) feature, this paper designs a method that recognizes the underwater targets based on the Delay-Doppler joint feature. First, this method uses symplectic finite Fourier transform (SFFT) to extract the Delay-Doppler features of underwater acoustic signals, analyzes the Time–Frequency features at the same time, and combines the Delay-Doppler (DD) feature and Time–Frequency feature to form a joint feature (TF-DD). This paper uses three types of convolutional neural networks to verify that TF-DD can effectively improve the accuracy of target recognition. Secondly, this paper designs an object recognition model (TF-DD-CNN) based on joint features as input, which simplifies the neural network's overall structure and improves the model's training efficiency. This research employs ship-radiated noise to validate the efficacy of TF-DD-CNN for target identification. The results demonstrate that the combined characteristic and the TF-DD-CNN model introduced in this study can proficiently detect ships, and the model notably enhances the precision of detection. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. A Multi-Task Convolutional Neural Network Relative Radiometric Calibration Based on Temporal Information.
- Author
-
Tang, Lei, Zhao, Xiangang, Hu, Xiuqing, Luo, Chuyao, and Lin, Manjun
- Subjects
- *
CONVOLUTIONAL neural networks , *RADIOMETRIC methods , *COMPUTER vision , *REMOTE-sensing images , *COMPUTER simulation , *DEEP learning - Abstract
Due to the continuous degradation of onboard satellite instruments over time, satellite images undergo degradation, necessitating calibration for tasks reliant on satellite data. The previous relative radiometric calibration methods are mainly categorized into traditional methods and deep learning methods. The traditional methods involve complex computations for each calibration, while deep-learning-based approaches tend to oversimplify the calibration process, utilizing generic computer vision models without tailored structures for calibration tasks. In this paper, we address the unique challenges of calibration by introducing a novel approach: a multi-task convolutional neural network calibration model leveraging temporal information. This pioneering method is the first to integrate temporal dynamics into the architecture of neural network calibration models. Extensive experiments conducted on the FY3A/B/C VIRR datasets showcase the superior performance of our approach compared to the existing state-of-the-art traditional and deep learning methods. Furthermore, tests with various backbones confirm the broad applicability of our framework across different convolutional neural networks. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Modeling and Forecasting Ionospheric foF2 Variation Based on CNN-BiLSTM-TPA during Low- and High-Solar Activity Years.
- Author
-
Xu, Baoyi, Huang, Wenqiang, Ren, Peng, Li, Yi, and Xiang, Zheng
- Subjects
- *
MACHINE learning , *CONVOLUTIONAL neural networks , *SOLAR activity , *IONOSPHERE , *PREDICTION models , *SOLAR cycle - Abstract
The transmission of high-frequency signals over long distances depends on the ionosphere's reflective properties, with the selection of operating frequencies being closely tied to variations in the ionosphere. The accurate prediction of ionospheric critical frequency foF2 and other parameters in low latitudes is of great significance for understanding ionospheric changes in high-frequency communications. Currently, deep learning algorithms demonstrate significant advantages in capturing characteristics of the ionosphere. In this paper, a state-of-the-art hybrid neural network is utilized in conjunction with a temporal pattern attention mechanism for predicting variations in the foF2 parameter during high- and low-solar activity years. Convolutional neural networks (CNNs) and bidirectional long short-term memory (BiLSTM), which is capable of extracting spatiotemporal features of ionospheric variations, are incorporated into a hybrid neural network. The foF2 data used for training and testing come from three observatories in Brisbane (27°53′S, 152°92′E), Darwin (12°45′S, 130°95′E) and Townsville (19°63′S, 146°85′E) in 2000, 2008, 2009 and 2014 (the peak or trough years of solar activity in solar cycles 23 and 24), using the advanced Australian Digital Ionospheric Sounder. The results show that the proposed model accurately captures the changes in ionospheric foF2 characteristics and outperforms International Reference Ionosphere 2020 (IRI-2020) and BiLSTM ionospheric prediction models. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Editorial on Special Issue "3D Reconstruction and Mobile Mapping in Urban Environments Using Remote Sensing".
- Author
-
Jiang, San, Weng, Duojie, Liu, Jianchen, and Jiang, Wanshou
- Subjects
- *
CONVOLUTIONAL neural networks , *SPHERICAL projection , *GEOGRAPHIC information systems , *STANDARD deviations , *GROUND penetrating radar , *DIGITAL photogrammetry , *SYNTHETIC aperture radar , *RAILROAD tunnels , *ROAD markings - Abstract
This document is an editorial on the special issue of "3D Reconstruction and Mobile Mapping in Urban Environments Using Remote Sensing." The editorial highlights the importance of 3D reconstruction and mobile mapping in various applications such as autonomous driving, smart logistics, pedestrian navigation, and virtual reality. It discusses the emergence of remote sensing-based techniques and cutting-edge technologies like SfM, SLAM, and deep learning that have enhanced the field. The special issue includes 15 high-quality papers covering topics such as image feature matching, LiDAR/image-fused SLAM, NeRF-based scene rendering, and other applications like InSAR point cloud registration and 3D GPR for underground imaging. The editorial concludes by expressing gratitude to the authors and reviewers for their contributions and highlighting the value of this special issue for further research. [Extracted from the article]
- Published
- 2024
- Full Text
- View/download PDF
8. Cross-Hopping Graph Networks for Hyperspectral–High Spatial Resolution (H 2) Image Classification.
- Author
-
Chen, Tao, Wang, Tingting, Chen, Huayue, Zheng, Bochuan, and Deng, Wu
- Subjects
- *
CONVOLUTIONAL neural networks , *IMAGE recognition (Computer vision) , *FEATURE extraction , *REMOTE sensing , *IMAGE fusion , *MULTISPECTRAL imaging - Abstract
As we take stock of the contemporary issue, remote sensing images are gradually advancing towards hyperspectral–high spatial resolution (H2) double-high images. However, high resolution produces serious spatial heterogeneity and spectral variability while improving image resolution, which increases the difficulty of feature recognition. So as to make the best of spectral and spatial features under an insufficient number of marking samples, we would like to achieve effective recognition and accurate classification of features in H2 images. In this paper, a cross-hop graph network for H2 image classification(H2-CHGN) is proposed. It is a two-branch network for deep feature extraction geared towards H2 images, consisting of a cross-hop graph attention network (CGAT) and a multiscale convolutional neural network (MCNN): the CGAT branch utilizes the superpixel information of H2 images to filter samples with high spatial relevance and designate them as the samples to be classified, then utilizes the cross-hop graph and attention mechanism to broaden the range of graph convolution to obtain more representative global features. As another branch, the MCNN uses dual convolutional kernels to extract features and fuse them at various scales while attaining pixel-level multi-scale local features by parallel cross connecting. Finally, the dual-channel attention mechanism is utilized for fusion to make image elements more prominent. This experiment on the classical dataset (Pavia University) and double-high (H2) datasets (WHU-Hi-LongKou and WHU-Hi-HongHu) shows that the H2-CHGN can be efficiently and competently used in H2 image classification. In detail, experimental results showcase superior performance, outpacing state-of-the-art methods by 0.75–2.16% in overall accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. Ionospheric TEC Prediction in China during Storm Periods Based on Deep Learning: Mixed CNN-BiLSTM Method.
- Author
-
Ren, Xiaochen, Zhao, Biqiang, Ren, Zhipeng, and Xiong, Bo
- Subjects
- *
CONVOLUTIONAL neural networks , *METEOROLOGICAL research , *HISTORICAL maps , *STORMS , *DEEP learning - Abstract
Applying deep learning to high-precision ionospheric parameter prediction is a significant and growing field within the realm of space weather research. This paper proposes an improved model, Mixed Convolutional Neural Network (CNN)—Bidirectional Long Short-Term Memory (BiLSTM), for predicting the Total Electron Content (TEC) in China. This model was trained using the longest available Global Ionospheric Maps (GIM)-TEC from 1998 to 2023 in China, and underwent an interpretability analysis and accuracy evaluation. The results indicate that historical TEC maps play the most critical role, followed by Kp, ap, AE, F10.7, and time factor. The contributions of Dst and Disturbance Index (DI) to improving accuracy are relatively small but still essential. In long-term predictions, the contributions of the geomagnetic index, solar activity index, and time factor are higher. In addition, the model performs well in short-term predictions, accurately capturing the occurrence, evolution, and classification of ionospheric storms. However, as the predicted length increases, the accuracy gradually decreases, and some erroneous predictions may occur. The northeast region exhibits lower accuracy but a higher F1 score, which may be attributed to the frequency of ionospheric storm occurrences in different locations. Overall, the model effectively predicts the trends and evolution processes of ionospheric storms. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. A Global Spatial-Spectral Feature Fused Autoencoder for Nonlinear Hyperspectral Unmixing.
- Author
-
Zhang, Mingle, Yang, Mingyu, Xie, Hongyu, Yue, Pinliang, Zhang, Wei, Jiao, Qingbin, Xu, Liang, and Tan, Xin
- Subjects
- *
CONVOLUTIONAL neural networks , *DATA mining , *FEATURE extraction , *PIXELS , *NOISE , *DEEP learning - Abstract
Hyperspectral unmixing (HU) aims to decompose mixed pixels into a set of endmembers and corresponding abundances. Deep learning-based HU methods are currently a hot research topic, but most existing unmixing methods still rely on per-pixel training or employ convolutional neural networks (CNNs), which overlook the non-local correlations of materials and spectral characteristics. Furthermore, current research mainly focuses on linear mixing models, which limits the feature extraction capability of deep encoders and further improvement in unmixing accuracy. In this paper, we propose a nonlinear unmixing network capable of extracting global spatial-spectral features. The network is designed based on an autoencoder architecture, where a dual-stream CNNs is employed in the encoder to separately extract spectral and local spatial information. The extracted features are then fused together to form a more complete representation of the input data. Subsequently, a linear projection-based multi-head self-attention mechanism is applied to capture global contextual information, allowing for comprehensive spatial information extraction while maintaining lightweight computation. To achieve better reconstruction performance, a model-free nonlinear mixing approach is adopted to enhance the model's universality, with the mixing model learned entirely from the data. Additionally, an initialization method based on endmember bundles is utilized to reduce interference from outliers and noise. Comparative results on real datasets against several state-of-the-art unmixing methods demonstrate the superior of the proposed approach. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. An Auditory Convolutional Neural Network for Underwater Acoustic Target Timbre Feature Extraction and Recognition.
- Author
-
Ni, Junshuai, Ji, Fang, Lu, Shaoqing, and Feng, Weijia
- Subjects
- *
CONVOLUTIONAL neural networks , *BASILAR membrane , *FILTER banks , *AUDITORY perception , *AUDITORY selective attention - Abstract
In order to extract the line-spectrum features of underwater acoustic targets in complex environments, an auditory convolutional neural network (ACNN) with the ability of frequency component perception, timbre perception and critical information perception is proposed in this paper inspired by the human auditory perception mechanism. This model first uses a gammatone filter bank that mimics the cochlear basilar membrane excitation response to decompose the input time-domain signal into a number of sub-bands, which guides the network to perceive the line-spectrum frequency information of the underwater acoustic target. A sequence of convolution layers is then used to filter out interfering noise and enhance the line-spectrum components of each sub-band by simulating the process of calculating the energy distribution features, after which the improved channel attention module is connected to select line spectra that are more critical for recognition, and in this module, a new global pooling method is proposed and applied in order to better extract the intrinsic properties. Finally, the sub-band information is fused using a combination layer and a single-channel convolution layer to generate a vector with the same dimensions as the input signal at the output layer. A decision module with a Softmax classifier is added behind the auditory neural network and used to recognize the five classes of vessel targets in the ShipsEar dataset, achieving a recognition accuracy of 99.8%, which is improved by 2.7% compared to the last proposed DRACNN method, and there are different degrees of improvement over the other eight compared methods. The visualization results show that the model can significantly suppress the interfering noise intensity and selectively enhance the radiated noise line-spectrum energy of underwater acoustic targets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. 1D-CNN-Transformer for Radar Emitter Identification and Implemented on FPGA.
- Author
-
Gao, Xiangang, Wu, Bin, Li, Peng, and Jing, Zehuan
- Subjects
- *
ARTIFICIAL neural networks , *MACHINE learning , *FIELD programmable gate arrays , *CONVOLUTIONAL neural networks , *ENERGY consumption - Abstract
Deep learning has brought great development to radar emitter identification technology. In addition, specific emitter identification (SEI), as a branch of radar emitter identification, has also benefited from it. However, the complexity of most deep learning algorithms makes it difficult to adapt to the requirements of the low power consumption and high-performance processing of SEI on embedded devices, so this article proposes solutions from the aspects of software and hardware. From the software side, we design a Transformer variant network, lightweight convolutional Transformer (LW-CT) that supports parameter sharing. Then, we cascade convolutional neural networks (CNNs) and the LW-CT to construct a one-dimensional-CNN-Transformer(1D-CNN-Transformer) lightweight neural network model that can capture the long-range dependencies of radar emitter signals and extract signal spatial domain features meanwhile. In terms of hardware, we design a low-power neural network accelerator based on an FPGA to complete the real-time recognition of radar emitter signals. The accelerator not only designs high-efficiency computing engines for the network, but also devises a reconfigurable buffer called "Ping-pong CBUF" and two-level pipeline architecture for the convolution layer for alleviating the bottleneck caused by the off-chip storage access bandwidth. Experimental results show that the algorithm can achieve a high recognition performance of SEI with a low calculation overhead. In addition, the hardware acceleration platform not only perfectly meets the requirements of the radar emitter recognition system for low power consumption and high-performance processing, but also outperforms the accelerators in other papers in terms of the energy efficiency ratio of Transformer layer processing. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. Pyramid Cascaded Convolutional Neural Network with Graph Convolution for Hyperspectral Image Classification.
- Author
-
Pan, Haizhu, Yan, Hui, Ge, Haimiao, Wang, Liguo, and Shi, Cuiping
- Subjects
- *
CONVOLUTIONAL neural networks , *IMAGE recognition (Computer vision) , *FEATURE extraction , *COMPARATIVE method , *PYRAMIDS - Abstract
Convolutional neural networks (CNNs) and graph convolutional networks (GCNs) have made considerable advances in hyperspectral image (HSI) classification. However, most CNN-based methods learn features at a single-scale in HSI data, which may be insufficient for multi-scale feature extraction in complex data scenes. To learn the relations among samples in non-grid data, GCNs are employed and combined with CNNs to process HSIs. Nevertheless, most methods based on CNN-GCN may overlook the integration of pixel-wise spectral signatures. In this paper, we propose a pyramid cascaded convolutional neural network with graph convolution (PCCGC) for hyperspectral image classification. It mainly comprises CNN-based and GCN-based subnetworks. Specifically, in the CNN-based subnetwork, a pyramid residual cascaded module and a pyramid convolution cascaded module are employed to extract multiscale spectral and spatial features separately, which can enhance the robustness of the proposed model. Furthermore, an adaptive feature-weighted fusion strategy is utilized to adaptively fuse multiscale spectral and spatial features. In the GCN-based subnetwork, a band selection network (BSNet) is used to learn the spectral signatures in the HSI using nonlinear inter-band dependencies. Then, the spectral-enhanced GCN module is utilized to extract and enhance the important features in the spectral matrix. Subsequently, a mutual-cooperative attention mechanism is constructed to align the spectral signatures between BSNet-based matrix with the spectral-enhanced GCN-based matrix for spectral signature integration. Abundant experiments performed on four widely used real HSI datasets show that our model achieves higher classification accuracy than the fourteen other comparative methods, which shows the superior classification performance of PCCGC over the state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. AerialFormer: Multi-Resolution Transformer for Aerial Image Segmentation.
- Author
-
Hanyu, Taisei, Yamazaki, Kashu, Tran, Minh, McCann, Roy A., Liao, Haitao, Rainwater, Chase, Adkins, Meredith, Cothren, Jackson, and Le, Ngan
- Subjects
- *
CONVOLUTIONAL neural networks , *TRANSFORMER models , *PROCESS capability , *IMAGE segmentation , *REMOTE sensing - Abstract
When performing remote sensing image segmentation, practitioners often encounter various challenges, such as a strong imbalance in the foreground–background, the presence of tiny objects, high object density, intra-class heterogeneity, and inter-class homogeneity. To overcome these challenges, this paper introduces AerialFormer, a hybrid model that strategically combines the strengths of Transformers and Convolutional Neural Networks (CNNs). AerialFormer features a CNN Stem module integrated to preserve low-level and high-resolution features, enhancing the model's capability to process details of aerial imagery. The proposed AerialFormer is designed with a hierarchical structure, in which a Transformer encoder generates multi-scale features and a multi-dilated CNN (MDC) decoder aggregates the information from the multi-scale inputs. As a result, information is taken into account in both local and global contexts, so that powerful representations and high-resolution segmentation can be achieved. The proposed AerialFormer was benchmarked on three benchmark datasets, including iSAID, LoveDA, and Potsdam. Comprehensive experiments and extensive ablation studies show that the proposed AerialFormer remarkably outperforms state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. MGCET: MLP-mixer and Graph Convolutional Enhanced Transformer for Hyperspectral Image Classification.
- Author
-
Al-qaness, Mohammed A. A., Wu, Guoyong, and AL-Alimi, Dalal
- Subjects
- *
CONVOLUTIONAL neural networks , *TRANSFORMER models , *DATA mining , *IMAGE recognition (Computer vision) , *FEATURE extraction , *SPECTRAL imaging - Abstract
The vision transformer (ViT) has demonstrated performance comparable to that of convolutional neural networks (CNN) in the hyperspectral image classification domain. This is achieved by transforming images into sequence data and mining global spectral-spatial information to establish remote dependencies. Nevertheless, both the ViT and CNNs have their own limitations. For instance, a CNN is constrained by the extent of its receptive field, which prevents it from fully exploiting global spatial-spectral features. Conversely, the ViT is prone to excessive distraction during the feature extraction process. To be able to overcome the problem of insufficient feature information extraction caused using by a single paradigm, this paper proposes an MLP-mixer and a graph convolutional enhanced transformer (MGCET), whose network consists of a spatial-spectral extraction block (SSEB), an MLP-mixer, and a graph convolutional enhanced transformer (GCET). First, spatial-spectral features are extracted using SSEB, and then local spatial-spectral features are fused with global spatial-spectral features by the MLP-mixer. Finally, graph convolution is embedded in multi-head self-attention (MHSA) to mine spatial relationships and similarity between pixels, which further improves the modeling capability of the model. Correlation experiments were conducted on four different HSI datasets. The MGEET algorithm achieved overall accuracies (OAs) of 95.45%, 97.57%, 98.05%, and 98.52% on these datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. Virtual Restoration of Ancient Mold-Damaged Painting Based on 3D Convolutional Neural Network for Hyperspectral Image.
- Author
-
Wang, Sa, Cen, Yi, Qu, Liang, Li, Guanghua, Chen, Yao, and Zhang, Lifu
- Subjects
- *
CONVOLUTIONAL neural networks , *STANDARD deviations , *PRESERVATION of painting , *CULTURAL values , *CULTURAL property , *DIGITAL preservation - Abstract
Painted cultural relics hold significant historical value and are crucial in transmitting human culture. However, mold is a common issue for paper or silk-based relics, which not only affects their preservation and longevity but also conceals the texture, patterns, and color information, hindering cultural value and heritage. Currently, the virtual restoration of painting relics primarily involves filling in the RGB based on neighborhood information, which might cause color distortion and other problems. Another approach considers mold as noise and employs maximum noise separation for its removal; however, eliminating the mold components and implementing the inverse transformation often leads to more loss of information. To effectively acquire virtual restoration for mold removal from ancient paintings, the spectral characteristics of mold were analyzed. Based on the spectral features of mold and the cultural relic restoration philosophy of maintaining originality, a 3D CNN artifact restoration network was proposed. This network is capable of learning features in the near-infrared spectrum (NIR) and spatial dimensions to reconstruct the reflectance of visible spectrum, achieving the virtual restoration for mold removal of calligraphic and art relics. Using an ancient painting from the Qing Dynasty as a test subject, the proposed method was compared with the Inpainting, Criminisi, and inverse MNF transformation methods across three regions. Visual analysis, quantitative evaluation (the root mean squared error (RMSE), mean absolute percentage error (MAPE), mean absolute error (MEA), and a classification application were used to assess the restoration accuracy. The visual results and quantitative analyses demonstrated that the proposed 3D CNN method effectively removes or mitigates mold while restoring the artwork to its authentic color in various backgrounds. Furthermore, the color classification results indicated that the images restored with 3D CNN had the highest classification accuracy, with overall accuracies of 89.51%, 92.24%, and 93.63%, and Kappa coefficients of 0.88, 0.91, and 0.93, respectively. This research provides technological support for the digitalization and restoration of cultural artifacts, thereby contributing to the preservation and transmission of cultural heritage. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. TC–Radar: Transformer–CNN Hybrid Network for Millimeter-Wave Radar Object Detection.
- Author
-
Jia, Fengde, Li, Chenyang, Bi, Siyi, Qian, Junhui, Wei, Leizhe, and Sun, Guohao
- Subjects
- *
CONVOLUTIONAL neural networks , *OBJECT recognition (Computer vision) , *TRANSFORMER models , *DATA integration , *NETWORK performance , *INTELLIGENT transportation systems - Abstract
In smart transportation, assisted driving relies on data integration from various sensors, notably LiDAR and cameras. However, their optical performance can degrade under adverse weather conditions, potentially compromising vehicle safety. Millimeter-wave radar, which can overcome these issues more economically, has been re-evaluated. Despite this, developing an accurate detection model is challenging due to significant noise interference and limited semantic information. To address these practical challenges, this paper presents the TC–Radar model, a novel approach that synergistically integrates the strengths of transformer and the convolutional neural network (CNN) to optimize the sensing potential of millimeter-wave radar in smart transportation systems. The rationale for this integration lies in the complementary nature of CNNs, which are adept at capturing local spatial features, and transformers, which excel at modeling long-range dependencies and global context within data. This hybrid approach allows for a more robust and accurate representation of radar signals, leading to enhanced detection performance. A key innovation of our approach is the introduction of the Cross-Attention (CA) module, which facilitates efficient and dynamic information exchange between the encoder and decoder stages of the network. This CA mechanism ensures that critical features are accurately captured and transferred, thereby significantly improving the overall network performance. In addition, the model contains the dense information fusion block (DIFB) to further enrich the feature representation by integrating different high-frequency local features. This integration process ensures thorough incorporation of key data points. Extensive tests conducted on the CRUW and CARRADA datasets validate the strengths of this method, with the model achieving an average precision (AP) of 83.99% and a mean intersection over union (mIoU) of 45.2%, demonstrating robust radar sensing capabilities. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. Fire-Net: Rapid Recognition of Forest Fires in UAV Remote Sensing Imagery Using Embedded Devices.
- Author
-
Li, Shouliang, Han, Jiale, Chen, Fanghui, Min, Rudong, Yi, Sixue, and Yang, Zhen
- Subjects
- *
FOREST fires , *CONVOLUTIONAL neural networks , *FOREST monitoring , *DRONE aircraft , *WILDFIRES - Abstract
Forest fires pose a catastrophic threat to Earth's ecology as well as threaten human beings. Timely and accurate monitoring of forest fires can significantly reduce potential casualties and property damage. Thus, to address the aforementioned problems, this paper proposed an unmanned aerial vehicle (UAV) based on a lightweight forest fire recognition model, Fire-Net, which has a multi-stage structure and incorporates cross-channel attention following the fifth stage. This is to enable the model's ability to perceive features at various scales, particularly small-scale fire sources in wild forest scenes. Through training and testing on a real-world dataset, various lightweight convolutional neural networks were evaluated on embedded devices. The experimental outcomes indicate that Fire-Net attained an accuracy of 98.18%, a precision of 99.14%, and a recall of 98.01%, surpassing the current leading methods. Furthermore, the model showcases an average inference time of 10 milliseconds per image and operates at 86 frames per second (FPS) on embedded devices. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. Real-Time Wildfire Monitoring Using Low-Altitude Remote Sensing Imagery.
- Author
-
Tong, Hongwei, Yuan, Jianye, Zhang, Jingjing, Wang, Haofei, and Li, Teng
- Subjects
- *
CONVOLUTIONAL neural networks , *TRANSFORMER models , *DRONE aircraft , *SUMMER , *REMOTE sensing , *FIRE detectors - Abstract
With rising global temperatures, wildfires frequently occur worldwide during the summer season. The timely detection of these fires, based on unmanned aerial vehicle (UAV) images, can significantly reduce the damage they cause. Existing Convolutional Neural Network (CNN)-based fire detection methods usually use multiple convolutional layers to enhance the receptive fields, but this compromises real-time performance. This paper proposes a novel real-time semantic segmentation network called FireFormer, combining the strengths of CNNs and Transformers to detect fires. An agile ResNet18 as the encoding component tailored to fulfill the efficient fire segmentation is adopted here, and a Forest Fire Transformer Block (FFTB) rooted in the Transformer architecture is proposed as the decoding mechanism. Additionally, to accurately detect and segment small fire spots, we have developed a novel Feature Refinement Network (FRN) to enhance fire segmentation accuracy. The experimental results demonstrate that our proposed FireFormer achieves state-of-the-art performance on the publicly available forest fire dataset FLAME—specifically, with an impressive 73.13% IoU and 84.48% F1 Score. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. MBT-UNet: Multi-Branch Transform Combined with UNet for Semantic Segmentation of Remote Sensing Images.
- Author
-
Liu, Bin, Li, Bing, Sreeram, Victor, and Li, Shuofeng
- Subjects
- *
CONVOLUTIONAL neural networks , *TRANSFORMER models , *REMOTE sensing , *ENVIRONMENTAL monitoring , *RESOURCE management - Abstract
Remote sensing (RS) images play an indispensable role in many key fields such as environmental monitoring, precision agriculture, and urban resource management. Traditional deep convolutional neural networks have the problem of limited receptive fields. To address this problem, this paper introduces a hybrid network model that combines the advantages of CNN and Transformer, called MBT-UNet. First, a multi-branch encoder design based on the pyramid vision transformer (PVT) is proposed to effectively capture multi-scale feature information; second, an efficient feature fusion module (FFM) is proposed to optimize the collaboration and integration of features at different scales; finally, in the decoder stage, a multi-scale upsampling module (MSUM) is proposed to further refine the segmentation results and enhance segmentation accuracy. We conduct experiments on the ISPRS Vaihingen dataset, the Potsdam dataset, the LoveDA dataset, and the UAVid dataset. Experimental results show that MBT-UNet surpasses state-of-the-art algorithms in key performance indicators, confirming its superior performance in high-precision remote sensing image segmentation tasks. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. Radar Emitter Recognition Based on Spiking Neural Networks.
- Author
-
Luo, Zhenghao, Wang, Xingdong, Yuan, Shuo, and Liu, Zhangmeng
- Subjects
- *
ARTIFICIAL neural networks , *RADAR signal processing , *MILITARY electronics , *CONVOLUTIONAL neural networks , *ELECTRONIC measurements - Abstract
Efficient and effective radar emitter recognition is critical for electronic support measurement (ESM) systems. However, in complex electromagnetic environments, intercepted pulse trains generally contain substantial data noise, including spurious and missing pulses. Currently, radar emitter recognition methods utilizing traditional artificial neural networks (ANNs) like CNNs and RNNs are susceptible to data noise and require intensive computations, posing challenges to meeting the performance demands of modern ESM systems. Spiking neural networks (SNNs) exhibit stronger representational capabilities compared to traditional ANNs due to the temporal dynamics of spiking neurons and richer information encoded in precise spike timing. Furthermore, SNNs achieve higher computational efficiency by performing event-driven sparse addition calculations. In this paper, a lightweight spiking neural network is proposed by combining direct coding, leaky integrate-and-fire (LIF) neurons, and surrogate gradients to recognize radar emitters. Additionally, an improved SNN for radar emitter recognition is proposed, leveraging the local timing structure of pulses to enhance adaptability to data noise. Simulation results demonstrate the superior performance of the proposed method over existing methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. A Novel Mamba Architecture with a Semantic Transformer for Efficient Real-Time Remote Sensing Semantic Segmentation.
- Author
-
Ding, Hao, Xia, Bo, Liu, Weilin, Zhang, Zekai, Zhang, Jinglin, Wang, Xing, and Xu, Sen
- Subjects
- *
CONVOLUTIONAL neural networks , *REMOTE sensing , *TRANSFORMER models , *COMPUTATIONAL complexity , *EARTHQUAKES - Abstract
Real-time remote sensing segmentation technology is crucial for unmanned aerial vehicles (UAVs) in battlefield surveillance, land characterization observation, earthquake disaster assessment, etc., and can significantly enhance the application value of UAVs in military and civilian fields. To realize this potential, it is essential to develop real-time semantic segmentation methods that can be applied to resource-limited platforms, such as edge devices. The majority of mainstream real-time semantic segmentation methods rely on convolutional neural networks (CNNs) and transformers. However, CNNs cannot effectively capture long-range dependencies, while transformers have high computational complexity. This paper proposes a novel remote sensing Mamba architecture for real-time segmentation tasks in remote sensing, named RTMamba. Specifically, the backbone utilizes a Visual State-Space (VSS) block to extract deep features and maintains linear computational complexity, thereby capturing long-range contextual information. Additionally, a novel Inverted Triangle Pyramid Pooling (ITP) module is incorporated into the decoder. The ITP module can effectively filter redundant feature information and enhance the perception of objects and their boundaries in remote sensing images. Extensive experiments were conducted on three challenging aerial remote sensing segmentation benchmarks, including Vaihingen, Potsdam, and LoveDA. The results show that RTMamba achieves competitive performance advantages in terms of segmentation accuracy and inference speed compared to state-of-the-art CNN and transformer methods. To further validate the deployment potential of the model on embedded devices with limited resources, such as UAVs, we conducted tests on the Jetson AGX Orin edge device. The experimental results demonstrate that RTMamba achieves impressive real-time segmentation performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
23. Intrapulse Modulation Radar Signal Recognition Using CNN with Second-Order STFT-Based Synchrosqueezing Transform.
- Author
-
Dong, Ning, Jiang, Hong, Liu, Yipeng, and Zhang, Jingtao
- Subjects
- *
CONVOLUTIONAL neural networks , *SIGNAL classification , *FOURIER transforms , *SIGNAL-to-noise ratio , *RADAR , *PHOTOPLETHYSMOGRAPHY - Abstract
Intrapulse modulation classification of radar signals plays an important role in modern electronic reconnaissance, countermeasures, etc. In this paper, to improve the recognition rate at low signal-to-noise ratio (SNR), we propose a recognition method using the second-order short-time Fourier transform (STFT)-based synchrosqueezing transform (FSST2) combined with a modified convolution neural network, which we name MeNet. In particular, the radar signals are first preprocessed via the time–frequency analysis and STFT-based FSST2. Then, the informative features of the time–frequency images (TFIs) are deeply learned and classified through the MeNet with several specific convolutional blocks. The simulation results show that the overall recognition rate for seven types of intrapulse modulation radar signals can reach 95.6%, even when the SNR is −12 dB. Compared with other networks, the excellent recognition rate proves the superiority of our method. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. BAFormer: A Novel Boundary-Aware Compensation UNet-like Transformer for High-Resolution Cropland Extraction.
- Author
-
Li, Zhiyong, Wang, Youming, Tian, Fa, Zhang, Junbo, Chen, Yijie, and Li, Kunhong
- Subjects
- *
CONVOLUTIONAL neural networks , *TRANSFORMER models , *DEEP learning , *REMOTE sensing , *FARMS - Abstract
Utilizing deep learning for semantic segmentation of cropland from remote sensing imagery has become a crucial technique in land surveys. Cropland is highly heterogeneous and fragmented, and existing methods often suffer from inaccurate boundary segmentation. This paper introduces a UNet-like boundary-aware compensation model (BAFormer). Cropland boundaries typically exhibit rapid transformations in pixel values and texture features, often appearing as high-frequency features in remote sensing images. To enhance the recognition of these high-frequency features as represented by cropland boundaries, the proposed BAFormer integrates a Feature Adaptive Mixer (FAM) and develops a Depthwise Large Kernel Multi-Layer Perceptron model (DWLK-MLP) to enrich the global and local cropland boundaries features separately. Specifically, FAM enhances the boundary-aware method by adaptively acquiring high-frequency features through convolution and self-attention advantages, while DWLK-MLP further supplements boundary position information using a large receptive field. The efficacy of BAFormer has been evaluated on datasets including Vaihingen, Potsdam, LoveDA, and Mapcup. It demonstrates high performance, achieving mIoU scores of 84.5%, 87.3%, 53.5%, and 83.1% on these datasets, respectively. Notably, BAFormer-T (lightweight model) surpasses other lightweight models on the Vaihingen dataset with scores of 91.3% F1 and 84.1% mIoU. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. Graph Neural Networks in Point Clouds: A Survey.
- Author
-
Li, Dilong, Lu, Chenghui, Chen, Ziyi, Guan, Jianlong, Zhao, Jing, and Du, Jixiang
- Subjects
- *
GRAPH neural networks , *CONVOLUTIONAL neural networks , *NATURAL language processing , *OBJECT recognition (Computer vision) , *TRANSFORMER models - Abstract
With the advancement of 3D sensing technologies, point clouds are gradually becoming the main type of data representation in applications such as autonomous driving, robotics, and augmented reality. Nevertheless, the irregularity inherent in point clouds presents numerous challenges for traditional deep learning frameworks. Graph neural networks (GNNs) have demonstrated their tremendous potential in processing graph-structured data and are widely applied in various domains including social media data analysis, molecular structure calculation, and computer vision. GNNs, with their capability to handle non-Euclidean data, offer a novel approach for addressing these challenges. Additionally, drawing inspiration from the achievements of transformers in natural language processing, graph transformers have propelled models towards global awareness, overcoming the limitations of local aggregation mechanisms inherent in early GNN architectures. This paper provides a comprehensive review of GNNs and graph-based methods in point cloud applications, adopting a task-oriented perspective to analyze this field. We categorize GNN methods for point clouds based on fundamental tasks, such as segmentation, classification, object detection, registration, and other related tasks. For each category, we summarize the existing mainstream methods, conduct a comprehensive analysis of their performance on various datasets, and discuss the development trends and future prospects of graph-based methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. A Study on the Object-Based High-Resolution Remote Sensing Image Classification of Crop Planting Structures in the Loess Plateau of Eastern Gansu Province.
- Author
-
Yang, Rui, Qi, Yuan, Zhang, Hui, Wang, Hongwei, Zhang, Jinlong, Ma, Xiaofang, Zhang, Juan, and Ma, Chao
- Subjects
- *
IMAGE recognition (Computer vision) , *CONVOLUTIONAL neural networks , *REMOTE sensing , *CROPS , *STANDARD deviations , *IMAGE segmentation , *CROP quality , *PRECISION farming - Abstract
The timely and accurate acquisition of information on the distribution of the crop planting structure in the Loess Plateau of eastern Gansu Province, one of the most important agricultural areas in Western China, is crucial for promoting fine management of agriculture and ensuring food security. This study uses multi-temporal high-resolution remote sensing images to determine optimal segmentation scales for various crops, employing the estimation of scale parameter 2 (ESP2) tool and the Ratio of Mean Absolute Deviation to Standard Deviation (RMAS) model. The Canny edge detection algorithm is then applied for multi-scale image segmentation. By incorporating crop phenological factors and using the L1-regularized logistic regression model, we optimized 39 spatial feature factors—including spectral, textural, geometric, and index features. Within a multi-level classification framework, the Random Forest (RF) classifier and Convolutional Neural Network (CNN) model are used to classify the cropping patterns in four test areas based on the multi-scale segmented images. The results indicate that integrating the Canny edge detection algorithm with the optimal segmentation scales calculated using the ESP2 tool and RMAS model produces crop parcels with more complete boundaries and better separability. Additionally, optimizing spatial features using the L1-regularized logistic regression model, combined with phenological information, enhances classification accuracy. Within the OBIC framework, the RF classifier achieves higher accuracy in classifying cropping patterns. The overall classification accuracies for the four test areas are 91.93%, 94.92%, 89.37%, and 90.68%, respectively. This paper introduced crop phenological factors, effectively improving the extraction precision of the shattered agricultural planting structure in the Loess Plateau of eastern Gansu Province. Its findings have important application value in crop monitoring, management, food security and other related fields. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. Application of Deep Learning for Segmenting Seepages in Levee Systems.
- Author
-
Panta, Manisha, Thapa, Padam Jung, Hoque, Md Tamjidul, Niles, Kendall N., Sloan, Steve, Flanagin, Maik, Pathak, Ken, and Abdelguerfi, Mahdi
- Subjects
- *
DEEP learning , *CONVOLUTIONAL neural networks , *LEVEES - Abstract
Seepage is a typical hydraulic factor that can initiate the breaching process in a levee system. If not identified and treated on time, seepages can be a severe problem for levees, weakening the levee structure and eventually leading to collapse. Therefore, it is essential always to be vigilant with regular monitoring procedures to identify seepages throughout these levee systems and perform adequate repairs to limit potential threats from unforeseen levee failures. This paper introduces a fully convolutional neural network to identify and segment seepage from the image in levee systems. To the best of our knowledge, this is the first work in this domain. Applying deep learning techniques for semantic segmentation tasks in real-world scenarios has its own challenges, especially the difficulty for models to effectively learn from complex backgrounds while focusing on simpler objects of interest. This challenge is particularly evident in the task of detecting seepages in levee systems, where the fault is relatively simple compared to the complex and varied background. We addressed this problem by introducing negative images and a controlled transfer learning approach for semantic segmentation for accurate seepage segmentation in levee systems. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. Deep Learning-Based Detection of Oil Spills in Pakistan's Exclusive Economic Zone from January 2017 to December 2023.
- Author
-
Basit, Abdul, Siddique, Muhammad Adnan, Bashir, Salman, Naseer, Ehtasham, and Sarfraz, Muhammad Saquib
- Subjects
- *
CONVOLUTIONAL neural networks , *OIL spills , *OIL seepage , *ALGAL blooms , *TOXIC algae , *MARINE accidents , *INSPECTION & review - Abstract
Oil spillages on a sea's or an ocean's surface are a threat to marine and coastal ecosystems. They are mainly caused by ship accidents, illegal discharge of oil from ships during cleaning and oil seepage from natural reservoirs. Synthetic-Aperture Radar (SAR) has proved to be a useful tool for analyzing oil spills, because it operates in all-day, all-weather conditions. An oil spill can typically be seen as a dark stretch in SAR images and can often be detected through visual inspection. The major challenge is to differentiate oil spills from look-alikes, i.e., low-wind areas, algae blooms and grease ice, etc., that have a dark signature similar to that of an oil spill. It has been noted over time that oil spill events in Pakistan's territorial waters often remain undetected until the oil reaches the coastal regions or it is located by concerned authorities during patrolling. A formal remote sensing-based operational framework for oil spills detection in Pakistan's Exclusive Economic Zone (EEZ) in the Arabian Sea is urgently needed. In this paper, we report the use of an encoder–decoder-based convolutional neural network trained on an annotated dataset comprising selected oil spill events verified by the European Maritime Safety Agency (EMSA). The dataset encompasses multiple classes, viz., sea surface, oil spill, look-alikes, ships and land. We processed Sentinel-1 acquisitions over the EEZ from January 2017 to December 2023, and we thereby prepared a repository of SAR images for the aforementioned duration. This repository contained images that had been vetted by SAR experts, to trace and confirm oil spills. We tested the repository using the trained model, and, to our surprise, we detected 92 previously unreported oil spill events within those seven years. In 2020, our model detected 26 oil spills in the EEZ, which corresponds to the highest number of spills detected in a single year; whereas in 2023, our model detected 10 oil spill events. In terms of the total surface area covered by the spills, the worst year was 2021, with a cumulative 395 sq. km covered in oil or an oil-like substance. On the whole, these are alarming figures. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. Global-Local Collaborative Learning Network for Optical Remote Sensing Image Change Detection.
- Author
-
Li, Jinghui, Shao, Feng, Liu, Qiang, and Meng, Xiangchao
- Subjects
- *
OPTICAL remote sensing , *COLLABORATIVE learning , *CONVOLUTIONAL neural networks , *TRANSFORMER models , *REMOTE sensing , *ARTIFICIAL satellites - Abstract
Due to the widespread applications of change detection technology in urban change analysis, environmental monitoring, agricultural surveillance, disaster detection, and other domains, the task of change detection has become one of the primary applications of Earth orbit satellite remote sensing data. However, the analysis of dual-temporal change detection (CD) remains a challenge in high-resolution optical remote sensing images due to the complexities in remote sensing images, such as intricate textures, seasonal variations in imaging time, climatic differences, and significant differences in the sizes of various objects. In this paper, we propose a novel U-shaped architecture for change detection. In the encoding stage, a multi-branch feature extraction module is employed by combining CNN and transformer networks to enhance the network's perception capability for objects of varying sizes. Furthermore, a multi-branch aggregation module is utilized to aggregate features from different branches, providing the network with global attention while preserving detailed information. For dual-temporal features, we introduce a spatiotemporal discrepancy perception module to model the context of dual-temporal images. Particularly noteworthy is the construction of channel attention and token attention modules based on the transformer attention mechanism to facilitate information interaction between multi-level features, thereby enhancing the network's contextual awareness. The effectiveness of the proposed network is validated on three public datasets, demonstrating its superior performance over other state-of-the-art methods through qualitative and quantitative experiments. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
30. Lightweight Pedestrian Detection Network for UAV Remote Sensing Images Based on Strideless Pooling.
- Author
-
Liu, Sanzai, Cao, Lihua, and Li, Yi
- Subjects
- *
OBJECT recognition (Computer vision) , *PEDESTRIANS , *TRAFFIC monitoring , *CONVOLUTIONAL neural networks , *EMERGENCY management - Abstract
The need for pedestrian target detection in uncrewed aerial vehicle (UAV) remote sensing images has become increasingly significant as the technology continues to evolve. UAVs equipped with high-resolution cameras can capture detailed imagery of various scenarios, making them ideal for monitoring and surveillance applications. Pedestrian detection is particularly crucial in scenarios such as traffic monitoring, security surveillance, and disaster response, where the safety and well-being of individuals are paramount. However, pedestrian detection in UAV remote sensing images poses several challenges. Firstly, the small size of pedestrians relative to the overall image, especially at higher altitudes, makes them difficult to detect. Secondly, the varying backgrounds and lighting conditions in remote sensing images can further complicate the task of detection. Traditional object detection methods often struggle to handle these complexities, resulting in decreased detection accuracy and increased false positives. Addressing the aforementioned concerns, this paper proposes a lightweight object detection model that integrates GhostNet and YOLOv5s. Building upon this foundation, we further introduce the SPD-Conv module to the model. With this addition, the aim is to preserve fine-grained features of the images during downsampling, thereby enhancing the model's capability to recognize small-scale objects. Furthermore, the coordinate attention module is introduced to further improve the model's recognition accuracy. In the proposed model, the number of parameters is successfully reduced to 4.77 M, compared with 7.01 M in YOLOv5s, representing a 32% reduction. The mean average precision (mAP) increased from 0.894 to 0.913, reflecting a 1.9% improvement. We have named the proposed model "GSC-YOLO". This study holds significant importance in advancing the lightweighting of UAV target detection models and addressing the challenges associated with complex scene object detection. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
31. Joint Classification of Hyperspectral and LiDAR Data Based on Adaptive Gating Mechanism and Learnable Transformer.
- Author
-
Wang, Minhui, Sun, Yaxiu, Xiang, Jianhong, Sun, Rui, and Zhong, Yu
- Subjects
- *
TRANSFORMER models , *CONVOLUTIONAL neural networks , *LIDAR , *DIGITAL elevation models , *TRANSFER matrix , *DATA fusion (Statistics) - Abstract
Utilizing multi-modal data, as opposed to only hyperspectral image (HSI), enhances target identification accuracy in remote sensing. Transformers are applied to multi-modal data classification for their long-range dependency but often overlook intrinsic image structure by directly flattening image blocks into vectors. Moreover, as the encoder deepens, unprofitable information negatively impacts classification performance. Therefore, this paper proposes a learnable transformer with an adaptive gating mechanism (AGMLT). Firstly, a spectral–spatial adaptive gating mechanism (SSAGM) is designed to comprehensively extract the local information from images. It mainly contains point depthwise attention (PDWA) and asymmetric depthwise attention (ADWA). The former is for extracting spectral information of HSI, and the latter is for extracting spatial information of HSI and elevation information of LiDAR-derived rasterized digital surface models (LiDAR-DSM). By omitting linear layers, local continuity is maintained. Then, the layer Scale and learnable transition matrix are introduced to the original transformer encoder and self-attention to form the learnable transformer (L-Former). It improves data dynamics and prevents performance degradation as the encoder deepens. Subsequently, learnable cross-attention (LC-Attention) with the learnable transfer matrix is designed to augment the fusion of multi-modal data by enriching feature information. Finally, poly loss, known for its adaptability with multi-modal data, is employed in training the model. Experiments in the paper are conducted on four famous multi-modal datasets: Trento (TR), MUUFL (MU), Augsburg (AU), and Houston2013 (HU). The results show that AGMLT achieves optimal performance over some existing models. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. Vulnerable Road User Skeletal Pose Estimation Using mmWave Radars.
- Author
-
Zeng, Zhiyuan, Liang, Xingdong, Li, Yanlei, and Dang, Xiangwei
- Subjects
- *
ROAD users , *TRACKING radar , *RADAR targets , *CONVOLUTIONAL neural networks , *RADAR signal processing , *DATA augmentation - Abstract
A skeletal pose estimation method, named RVRU-Pose, is proposed to estimate the skeletal pose of vulnerable road users based on distributed non-coherent mmWave radar. In view of the limitation that existing methods for skeletal pose estimation are only applicable to small scenes, this paper proposes a strategy that combines radar intensity heatmaps and coordinate heatmaps as input to a deep learning network. In addition, we design a multi-resolution data augmentation and training method suitable for radar to achieve target pose estimation for remote and multi-target application scenarios. Experimental results show that RVRU-Pose can achieve better than 2 cm average localization accuracy for different subjects in different scenarios, which is superior in terms of accuracy and time compared to existing state-of-the-art methods for human skeletal pose estimation with radar. As an essential performance parameter of radar, the impact of angular resolution on the estimation accuracy of a skeletal pose is quantitatively analyzed and evaluated in this paper. Finally, RVRU-Pose has also been extended to the task of estimating the skeletal pose of a cyclist, reflecting the strong scalability of the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
33. LDnADMM-Net: A Denoising Unfolded Deep Neural Network for Direction-of-Arrival Estimations in A Low Signal-to-Noise Ratio.
- Author
-
Liang, Can, Liu, Mingxuan, Li, Yang, Wang, Yanhua, and Hu, Xueyao
- Subjects
- *
DIRECTION of arrival estimation , *CONVOLUTIONAL neural networks , *SIGNAL-to-noise ratio , *COMPRESSED sensing , *SIGNAL denoising - Abstract
In this paper, we explore the problem of direction-of-arrival (DOA) estimation for a non-uniform linear array (NULA) under strong noise. The compressed sensing (CS)-based methods are widely used in NULA DOA estimations. However, these methods commonly rely on the tuning of parameters, which are hard to fine-tune. Additionally, these methods lack robustness under strong noise. To address these issues, this paper proposes a novel DOA estimation approach using a deep neural network (DNN) for a NULA in a low SNR. The proposed network is designed based on the denoising convolutional neural network (DnCNN) and the alternating direction method of multipliers (ADMM), which is dubbed as LDnADMM-Net. First, we construct an unfolded DNN architecture that mimics the behavior of the iterative processing of an ADMM. In this way, the parameters of an ADMM can be transformed into the network weights, and thus we can adaptively optimize these parameters through network training. Then, we employ the DnCNN to develop a denoising module (DnM) and integrate it into the unfolded DNN. Using this DnM, we can enhance the anti-noise ability of the proposed network and obtain a robust DOA estimation in a low SNR. The simulation and experimental results show that the proposed LDnADMM-Net can obtain high-accuracy and super-resolution DOA estimations for a NULA with strong robustness in a low signal-to-noise ratio (SNR). [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
34. Target Detection Method for High-Frequency Surface Wave Radar RD Spectrum Based on (VI)CFAR-CNN and Dual-Detection Maps Fusion Compensation.
- Author
-
Ji, Yuanzheng, Liu, Aijun, Chen, Xuekun, Wang, Jiaqi, and Yu, Changjun
- Subjects
- *
CONVOLUTIONAL neural networks , *TRACKING algorithms , *AUTOMATIC identification - Abstract
This paper proposes a method for the intelligent detection of high-frequency surface wave radar (HFSWR) targets. This method cascades the adaptive constant false alarm (CFAR) detector variability index (VI) with the convolutional neural network (CNN) to form a cascade detector (VI)CFAR-CNN. First, the (VI)CFAR algorithm is used for the first-level detection of the range–Doppler (RD) spectrum; based on this result, the two-dimensional window slice data are extracted using the window with the position of the target on the RD spectrum as the center, and input into the CNN model to carry out further target and clutter identification. When the detection rate of the detector reaches a certain level and cannot be further improved due to the convergence of the CNN model, this paper uses a dual-detection maps fusion method to compensate for the loss of detection performance. First, the optimized parameters are used to perform the weighted fusion of the dual-detection maps, and then, the connected components in the fused detection map are further processed to achieve an independent (VI)CFAR to compensate for the (VI)CFAR-CNN detection results. Due to the difficulty in obtaining HFSWR data that include comprehensive and accurate target truth values, this paper adopts a method of embedding targets into the measured background to construct the RD spectrum dataset for HFSWR. At the same time, the proposed method is compared with various other methods to demonstrate its superiority. Additionally, a small amount of automatic identification system (AIS) and radar correlation data are used to verify the effectiveness and feasibility of this method on completely measured HFSWR data. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
35. Hyperspectral Image Denoising Based on Deep and Total Variation Priors.
- Author
-
Wang, Peng, Sun, Tianman, Chen, Yiming, Ge, Lihua, Wang, Xiaoyi, and Wang, Liguo
- Subjects
- *
DEEP learning , *IMAGE denoising , *CONVOLUTIONAL neural networks , *SPECTRAL imaging , *SPARSE matrices , *STOCHASTIC processes - Abstract
To address the problems of noise interference and image blurring in hyperspectral imaging (HSI), this paper proposes a denoising method for HSI based on deep learning and a total variation (TV) prior. The method minimizes the first-order moment distance between the deep prior of a Fast and Flexible Denoising Convolutional Neural Network (FFDNet) and the Enhanced 3D TV (E3DTV) prior, obtaining dual priors that complement and reinforce each other's advantages. Specifically, the original HSI is initially processed with a random binary sparse observation matrix to achieve a sparse representation. Subsequently, the plug-and-play (PnP) algorithm is employed within the framework of generalized alternating projection (GAP) to denoise the sparsely represented HSI. Experimental results demonstrate that, compared to existing methods, this method shows significant advantages in both quantitative and qualitative assessments, effectively enhancing the quality of HSIs. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
36. MFPANet: Multi-Scale Feature Perception and Aggregation Network for High-Resolution Snow Depth Estimation.
- Author
-
Zhao, Liling, Chen, Junyu, Shahzad, Muhammad, Xia, Min, and Lin, Haifeng
- Subjects
- *
SNOW accumulation , *MICROWAVE remote sensing , *SYNTHETIC aperture radar , *REMOTE-sensing images , *DEPTH perception , *REMOTE sensing , *AVALANCHES - Abstract
Accurate snow depth estimation is of significant importance, particularly for preventing avalanche disasters and predicting flood seasons. The predominant approaches for such snow depth estimation, based on deep learning methods, typically rely on passive microwave remote sensing data. However, due to the low resolution of passive microwave remote sensing data, it often results in low-accuracy outcomes, posing considerable limitations in application. To further improve the accuracy of snow depth estimation, in this paper, we used active microwave remote sensing data. We fused multi-spectral optical satellite images, synthetic aperture radar (SAR) images and land cover distribution images to generate a snow remote sensing dataset (SRSD). It is a first-of-its-kind dataset that includes active microwave remote sensing images in high-latitude regions of Asia. Using these novel data, we proposed a multi-scale feature perception and aggregation neural network (MFPANet) that focuses on improving feature extraction from multi-source images. Our systematic analysis reveals that the proposed approach is not only robust but also achieves high accuracy in snow depth estimation compared to existing state-of-the-art methods, with RMSE of 0.360 and with MAE of 0.128. Finally, we selected several representative areas in our study region and applied our method to map snow depth distribution, demonstrating its broad application prospects. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
37. Transfer Learning-Based Specific Emitter Identification for ADS-B over Satellite System.
- Author
-
Liu, Mingqian, Chai, Yae, Li, Ming, Wang, Jiakun, and Zhao, Nan
- Subjects
- *
CONVOLUTIONAL neural networks , *LOW earth orbit satellites , *AUTOMATIC dependent surveillance-broadcast , *HUMAN fingerprints , *FEATURE extraction , *DISTRIBUTED sensors - Abstract
In future aviation surveillance, the demand for higher real-time updates for global flights can be met by deploying automatic dependent surveillance–broadcast (ADS-B) receivers on low Earth orbit satellites, capitalizing on their global coverage and terrain-independent capabilities for seamless monitoring. Specific emitter identification (SEI) leverages the distinctive features of ADS-B data. High data collection and annotation costs, along with limited dataset size, can lead to overfitting during training and low model recognition accuracy. Transfer learning, which does not require source and target domain data to share the same distribution, significantly reduces the sensitivity of traditional models to data volume and distribution. It can also address issues related to the incompleteness and inadequacy of communication emitter datasets. This paper proposes a distributed sensor system based on transfer learning to address the specific emitter identification. Firstly, signal fingerprint features are extracted using a bispectrum transform (BST) to train a convolutional neural network (CNN) preliminarily. Decision fusion is employed to tackle the challenges of the distributed system. Subsequently, a transfer learning strategy is employed, incorporating frozen model parameters, maximum mean discrepancy (MMD), and classification error measures to reduce the disparity between the target and source domains. A hyperbolic space module is introduced before the output layer to enhance the expressive capacity and data information extraction. After iterative training, the transfer learning model is obtained. Simulation results confirm that this method enhances model generalization, addresses the issue of slow convergence, and leads to improved training accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
38. Combining "Deep Learning" and Physically Constrained Neural Networks to Derive Complex Glaciological Change Processes from Modern High-Resolution Satellite Imagery: Application of the GEOCLASS-Image System to Create VarioCNN for Glacier Surges.
- Author
-
Herzfeld, Ute C., Hessburg, Lawrence J., Trantow, Thomas M., and Hayes, Adam N.
- Subjects
- *
REMOTE-sensing images , *CONVOLUTIONAL neural networks , *DEEP learning , *GLACIERS , *IMAGE recognition (Computer vision) , *ACCELERATION (Mechanics) - Abstract
The objectives of this paper are to investigate the trade-offs between a physically constrained neural network and a deep, convolutional neural network and to design a combined ML approach ("VarioCNN"). Our solution is provided in the framework of a cyberinfrastructure that includes a newly designed ML software, GEOCLASS-image (v1.0), modern high-resolution satellite image data sets (Maxar WorldView data), and instructions/descriptions that may facilitate solving similar spatial classification problems. Combining the advantages of the physically-driven connectionist-geostatistical classification method with those of an efficient CNN, VarioCNN provides a means for rapid and efficient extraction of complex geophysical information from submeter resolution satellite imagery. A retraining loop overcomes the difficulties of creating a labeled training data set. Computational analyses and developments are centered on a specific, but generalizable, geophysical problem: The classification of crevasse types that form during the surge of a glacier system. A surge is a glacial catastrophe, an acceleration of a glacier to typically 100–200 times its normal velocity. GEOCLASS-image is applied to study the current (2016-2024) surge in the Negribreen Glacier System, Svalbard. The geophysical result is a description of the structural evolution and expansion of the surge, based on crevasse types that capture ice deformation in six simplified classes. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
39. Improving Artificial-Intelligence-Based Individual Tree Species Classification Using Pseudo Tree Crown Derived from Unmanned Aerial Vehicle Imagery.
- Author
-
Miao, Shengjie, Zhang, Kongwen, Zeng, Hongda, and Liu, Jane
- Subjects
- *
CROWNS (Botany) , *DRONE aircraft , *CONVOLUTIONAL neural networks , *LANDSAT satellites , *URBAN trees , *ARTIFICIAL intelligence - Abstract
Urban tree classification enables informed decision-making processes in urban planning and management. This paper introduces a novel data reformation method, pseudo tree crown (PTC), which enhances the feature difference in the input layer and results in the improvement of the accuracy and efficiency of urban tree classification by utilizing artificial intelligence (AI) techniques. The study involved a comparative analysis of the performance of various machine learning (ML) classifiers. The results revealed a significant enhancement in classification accuracy, with an improvement exceeding 10% observed when high spatial resolution imagery captured by an unmanned aerial vehicle (UAV) was utilized. Furthermore, the study found an impressive average classification accuracy of 93% achieved by a classifier built on the PyTorch framework, with ResNet50 leveraged as its convolutional neural network layer. These findings underscore the potential of AI-driven approaches in advancing urban tree classification methodologies for enhanced urban planning and management practices. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. SSAformer: Spatial–Spectral Aggregation Transformer for Hyperspectral Image Super-Resolution.
- Author
-
Wang, Haoqian, Zhang, Qi, Peng, Tao, Xu, Zhongjie, Cheng, Xiangai, Xing, Zhongyang, and Li, Teng
- Subjects
- *
TRANSFORMER models , *HIGH resolution imaging , *CONVOLUTIONAL neural networks , *REMOTE sensing , *ENVIRONMENTAL monitoring , *SPECTRAL imaging , *IMAGE reconstruction algorithms - Abstract
The hyperspectral image (HSI) distinguishes itself in material identification through its exceptional spectral resolution. However, its spatial resolution is constrained by hardware limitations, prompting the evolution of HSI super-resolution (SR) techniques. Single HSI SR endeavors to reconstruct high-spatial-resolution HSI from low-spatial-resolution inputs, and recent progress in deep learning-based algorithms has significantly advanced the quality of reconstructed images. However, convolutional methods struggle to extract comprehensive spatial and spectral features. Transformer-based models have yet to harness long-range dependencies across both dimensions fully, thus inadequately integrating spatial and spectral data. To solve the above problem, in this paper, we propose a new HSI SR method, SSAformer, which merges the strengths of CNNs and Transformers. It introduces specially designed attention mechanisms for HSI, including spatial and spectral attention modules, and overcomes the previous challenges in extracting and amalgamating spatial and spectral information. Evaluations on benchmark datasets show that SSAformer surpasses contemporary methods in enhancing spatial details and preserving spectral accuracy, underscoring its potential to expand HSI's utility in various domains, such as environmental monitoring and remote sensing. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. Changes in the Water Area of an Inland River Terminal Lake (Taitma Lake) Driven by Climate Change and Human Activities, 2017–2022.
- Author
-
Zi, Feng, Wang, Yong, Lu, Shanlong, Ikhumhen, Harrison Odion, Fang, Chun, Li, Xinru, Wang, Nan, and Kuang, Xinya
- Subjects
- *
ENDORHEIC lakes , *WATER resources development , *CONVOLUTIONAL neural networks , *LAKES , *DEEP learning , *CLIMATE change - Abstract
Constructed from a dataset capturing the seasonal and annual water body distribution of the lower Qarqan River in the Taitma Lake area from 2017 to 2022, and combined with the meteorological and hydraulic engineering data, the spatial and temporal change patterns of the Taitma Lake watershed area were determined. Analyses were conducted using Planetscope (PS) satellite images and a deep learning model. The results revealed the following: ① Deep learning-based water body extraction provides significantly greater accuracy than the conventional water body index approach. With an impressive accuracy of up to 96.0%, UPerNet was found to provide the most effective extraction results among the three convolutional neural networks (U-Net, DeeplabV3+, and UPerNet) used for semantic segmentation; ② Between 2017 and 2022, Taitma Lake's water area experienced a rapid decrease, with the distribution of water predominantly shifting towards the east–west direction more than the north–south. The shifts between 2017 and 2020 and between 2020 and 2022 were clearly discernible, with the latter stage (2020–2022) being more significant than the former (2017–2020); ③ According to observations, Taitma Lake's changing water area has been primarily influenced by human activity over the last six years. Based on the research findings of this paper, it was observed that this study provides a valuable scientific basis for water resource allocation aiming to balance the development of water resources in the middle and upper reaches of the Tarim and Qarqan Rivers, as well as for the ecological protection of the downstream Taitma Lake. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. MEA-EFFormer: Multiscale Efficient Attention with Enhanced Feature Transformer for Hyperspectral Image Classification.
- Author
-
Sun, Qian, Zhao, Guangrui, Fang, Yu, Fang, Chenrong, Sun, Le, and Li, Xingying
- Subjects
- *
IMAGE recognition (Computer vision) , *CONVOLUTIONAL neural networks , *DEEP learning , *TRANSFORMER models , *FEATURE extraction - Abstract
Hyperspectral image classification (HSIC) has garnered increasing attention among researchers. While classical networks like convolution neural networks (CNNs) have achieved satisfactory results with the advent of deep learning, they are confined to processing local information. Vision transformers, despite being effective at establishing long-distance dependencies, face challenges in extracting high-representation features for high-dimensional images. In this paper, we present the multiscale efficient attention with enhanced feature transformer (MEA-EFFormer), which is designed for the efficient extraction of spectral–spatial features, leading to effective classification. MEA-EFFormer employs a multiscale efficient attention feature extraction module to initially extract 3D convolution features and applies effective channel attention to refine spectral information. Following this, 2D convolution features are extracted and integrated with local binary pattern (LBP) spatial information to augment their representation. Then, the processed features are fed into a spectral–spatial enhancement attention (SSEA) module that facilitates interactive enhancement of spectral–spatial information across the three dimensions. Finally, these features undergo classification through a transformer encoder. We evaluate MEA-EFFormer against several state-of-the-art methods on three datasets and demonstrate its outstanding HSIC performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. Locating and Grading of Lidar-Observed Aircraft Wake Vortex Based on Convolutional Neural Networks.
- Author
-
Zhang, Xinyu, Zhang, Hongwei, Wang, Qichao, Liu, Xiaoying, Liu, Shouxin, Zhang, Rongchuan, Li, Rongzhong, and Wu, Songhua
- Subjects
- *
CONVOLUTIONAL neural networks , *DOPPLER lidar , *AERONAUTICAL safety measures - Abstract
Aircraft wake vortices are serious threats to aviation safety. The Pulsed Coherent Doppler Lidar (PCDL) has been widely used in the observation of aircraft wake vortices due to its advantages of high spatial-temporal resolution and high precision. However, the post-processing algorithms require significant computing resources, which cannot achieve the real-time detection of a wake vortex (WV). This paper presents an improved Convolutional Neural Network (CNN) method for WV locating and grading based on PCDL data to avoid the influence of unstable ambient wind fields on the localization and classification results of WV. Typical WV cases are selected for analysis, and the WV locating and grading models are validated on different test sets. The consistency of the analytical algorithm and the CNN algorithm is verified. The results indicate that the improved CNN method achieves satisfactory recognition accuracy with higher efficiency and better robustness, especially in the case of strong turbulence, where the CNN method recognizes the wake vortex while the analytical method cannot. The improved CNN method is expected to be applied to optimize the current aircraft spacing criteria, which is promising in terms of aviation safety and economic benefit improvement. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
44. Object-Based Semi-Supervised Spatial Attention Residual UNet for Urban High-Resolution Remote Sensing Image Classification.
- Author
-
Lu, Yuanbing, Li, Huapeng, Zhang, Ce, and Zhang, Shuqing
- Subjects
- *
CONVOLUTIONAL neural networks , *DISTRIBUTION (Probability theory) , *WILCOXON signed-rank test , *DEEP learning , *LAND cover - Abstract
Accurate urban land cover information is crucial for effective urban planning and management. While convolutional neural networks (CNNs) demonstrate superior feature learning and prediction capabilities using image-level annotations, the inherent mixed-category nature of input image patches leads to classification errors along object boundaries. Fully convolutional neural networks (FCNs) excel at pixel-wise fine segmentation, making them less susceptible to heterogeneous content, but they require fully annotated dense image patches, which may not be readily available in real-world scenarios. This paper proposes an object-based semi-supervised spatial attention residual UNet (OS-ARU) model. First, multiscale segmentation is performed to obtain segments from a remote sensing image, and segments containing sample points are assigned the categories of the corresponding points, which are used to train the model. Then, the trained model predicts class probabilities for all segments. Each unlabeled segment's probability distribution is compared against those of labeled segments for similarity matching under a threshold constraint. Through label propagation, pseudo-labels are assigned to unlabeled segments exhibiting high similarity to labeled ones. Finally, the model is retrained using the augmented training set incorporating the pseudo-labeled segments. Comprehensive experiments on aerial image benchmarks for Vaihingen and Potsdam demonstrate that the proposed OS-ARU achieves higher classification accuracy than state-of-the-art models, including OCNN, 2OCNN, and standard OS-U, reaching an overall accuracy (OA) of 87.83% and 86.71%, respectively. The performance improvements over the baseline methods are statistically significant according to the Wilcoxon Signed-Rank Test. Despite using significantly fewer sparse annotations, this semi-supervised approach still achieves comparable accuracy to the same model under full supervision. The proposed method thus makes a step forward in substantially alleviating the heavy sampling burden of FCNs (densely sampled deep learning models) to effectively handle the complex issue of land cover information identification and classification. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. Remote Sensing Image Dehazing via a Local Context-Enriched Transformer.
- Author
-
Nie, Jing, Xie, Jin, and Sun, Hanqing
- Subjects
- *
TRANSFORMER models , *REMOTE sensing , *CONVOLUTIONAL neural networks , *IMAGE reconstruction , *IMAGE processing - Abstract
Remote sensing image dehazing is a well-known remote sensing image processing task focused on restoring clean images from hazy images. The Transformer network, based on the self-attention mechanism, has demonstrated remarkable advantages in various image restoration tasks, due to its capacity to capture long-range dependencies within images. However, it is weak at modeling local context. Conversely, convolutional neural networks (CNNs) are adept at capturing local contextual information. Local contextual information could provide more details, while long-range dependencies capture global structure information. The combination of long-range dependencies and local context modeling is beneficial for remote sensing image dehazing. Therefore, in this paper, we propose a CNN-based adaptive local context enrichment module (ALCEM) to extract contextual information within local regions. Subsequently, we integrate our proposed ALCEM into the multi-head self-attention and feed-forward network of the Transformer, constructing a novel locally enhanced attention (LEA) and a local continuous-enhancement feed-forward network (LCFN). The LEA utilizes the ALCEM to inject local context information that is complementary to the long-range relationship modeled by multi-head self-attention, which is beneficial to removing haze and restoring details. The LCFN extracts multi-scale spatial information and selectively fuses them by the the ALCEM, which supplements more informative information compared with existing regular feed-forward networks with only position-specific information flow. Powered by the LEA and LCFN, a novel Transformer-based dehazing network termed LCEFormer is proposed to restore clear images from hazy remote sensing images, which combines the advantages of CNN and Transformer. Experiments conducted on three distinct datasets, namely DHID, ERICE, and RSID, demonstrate that our proposed LCEFormer achieves the state-of-the-art performance in hazy scenes. Specifically, our LCEFormer outperforms DCIL by 0.78 dB and 0.018 for PSNR and SSIM on the DHID dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
46. Detection of Military Targets on Ground and Sea by UAVs with Low-Altitude Oblique Perspective.
- Author
-
Zeng, Bohan, Gao, Shan, Xu, Yuelei, Zhang, Zhaoxiang, Li, Fan, and Wang, Chenghang
- Subjects
- *
CONVOLUTIONAL neural networks , *TRANSFORMER models - Abstract
Small-scale low-altitude unmanned aerial vehicles (UAVs) equipped with perception capability for military targets will become increasingly essential for strategic reconnaissance and stationary patrols in the future. To respond to challenges such as complex terrain and weather variations, as well as the deception and camouflage of military targets, this paper proposes a hybrid detection model that combines Convolutional Neural Network (CNN) and Transformer architecture in a decoupled manner. The proposed detector consists of the C-branch and the T-branch. In the C-branch, Multi-gradient Path Network (MgpNet) is introduced, inspired by the multi-gradient flow strategy, excelling in capturing the local feature information of an image. In the T-branch, RPFormer, a Region–Pixel two-stage attention mechanism, is proposed to aggregate the global feature information of the whole image. A feature fusion strategy is proposed to merge the feature layers of the two branches, further improving the detection accuracy. Furthermore, to better simulate real UAVs' reconnaissance environments, we construct a dataset of military targets in complex environments captured from an oblique perspective to evaluate the proposed detector. In ablation experiments, different fusion methods are validated, and the results demonstrate the effectiveness of the proposed fusion strategy. In comparative experiments, the proposed detector outperforms most advanced general detectors. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
47. Prediction of Sea Surface Temperature Using U-Net Based Model.
- Author
-
Ren, Jing, Wang, Changying, Sun, Ling, Huang, Baoxiang, Zhang, Deyu, Mu, Jiadong, and Wu, Jianqiang
- Subjects
- *
OCEAN temperature , *CONVOLUTIONAL neural networks - Abstract
Sea surface temperature (SST) is a key parameter in ocean hydrology. Currently, existing SST prediction methods fail to fully utilize the potential spatial correlation between variables. To address this challenge, we propose a spatiotenporal UNet (ST-UNet) model based on the UNet model. In particular, in the encoding phase of ST-UNet, we use parallel convolution with different kernel sizes to efficiently extract spatial features, and use ConvLSTM to capture temporal features based on the utilization of spatial features. Atrous Spatial Pyramid Pooling (ASPP) module is placed at the bottleneck of the network to further incorporate the multi-scale features, allowing the spatial features to be fully utilized. The final prediction is then generated in the decoding stage using parallel convolution with different kernel sizes similar to the encoding stage. We conducted a series of experiments on the Bohai Sea and Yellow Sea SST data set, as well as the South China Sea SST data set, using SST data from the past 35 days to predict SST data for 1, 3, and 7 days in the future. The model was trained using data spanning from 2010 to 2021, with data from 2022 being utilized to assess the model's predictive performance. The experimental results show that the model proposed in this research paper achieves excellent results at different prediction scales in both sea areas, and the model consistently outperforms other methods. Specifically, in the Bohai Sea and Yellow Sea sea areas, when the prediction scales are 1, 3, and 7 days, the MAE of ST-UNet outperforms the best results of the other three compared models by 17%, 12%, and 2%, and the MSE by 16%, 18%, and 9%, respectively. In the South China Sea, when the prediction ranges are 1, 3, and 7 days, the MAE of ST-UNet is 27%, 18%, and 3% higher than the best of the other three compared models, and the MSE is 46%, 39%, and 16% higher, respectively. Our results highlight the effectiveness of the ST-UNet model in capturing spatial correlations and accurately predicting SST. The proposed model is expected to improve marine hydrographic studies. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
48. A Renovated Framework of a Convolution Neural Network with Transformer for Detecting Surface Changes from High-Resolution Remote-Sensing Images.
- Author
-
Yao, Shunyu, Wang, Han, Su, Yalu, Li, Qing, Sun, Tao, Liu, Changjun, Li, Yao, and Cheng, Deqiang
- Subjects
- *
CONVOLUTIONAL neural networks , *TRANSFORMER models , *SURFACE of the earth , *FEATURE extraction , *REMOTE sensing - Abstract
Natural hazards are considered to have a strong link with climate change and human activities. With the rapid advancements in remote sensing technology, real-time monitoring and high-resolution remote-sensing images have become increasingly available, which provide precise details about the Earth's surface and enable prompt updates to support risk identification and management. This paper proposes a new network framework with Transformer architecture and a Residual network for detecting the changes in high-resolution remote-sensing images. The proposed model is trained using remote-sensing images from Shandong and Anhui Provinces of China in 2021 and 2022 while one district in 2023 is used to test the prediction accuracy. The performance of the proposed model is evaluated by using five matrices and further compared to both convention-based and attention-based models. The results demonstrated that the proposed structure integrates the great capability of conventional neural networks for image feature extraction with the ability to obtain global context from the attention mechanism, resulting in significant improvements in balancing positive sample identification while avoiding false positives in complex image change detection. Additionally, a toolkit supporting image preprocessing is developed for practical applications. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. Ship Detection with Deep Learning in Optical Remote-Sensing Images: A Survey of Challenges and Advances.
- Author
-
Zhao, Tianqi, Wang, Yongcheng, Li, Zheng, Gao, Yunxiao, Chen, Chi, Feng, Hao, and Zhao, Zhikang
- Subjects
- *
DEEP learning , *REMOTE-sensing images , *OPTICAL remote sensing , *OPTICAL images , *CONVOLUTIONAL neural networks , *TRANSFORMER models , *FEATURE extraction - Abstract
Ship detection aims to automatically identify whether there are ships in the images, precisely classifies and localizes them. Regardless of whether utilizing early manually designed methods or deep learning technology, ship detection is dedicated to exploring the inherent characteristics of ships to enhance recall. Nowadays, high-precision ship detection plays a crucial role in civilian and military applications. In order to provide a comprehensive review of ship detection in optical remote-sensing images (SDORSIs), this paper summarizes the challenges as a guide. These challenges include complex marine environments, insufficient discriminative features, large scale variations, dense and rotated distributions, large aspect ratios, and imbalances between positive and negative samples. We meticulously review the improvement methods and conduct a detailed analysis of the strengths and weaknesses of these methods. We compile ship information from common optical remote sensing image datasets and compare algorithm performance. Simultaneously, we compare and analyze the feature extraction capabilities of backbones based on CNNs and Transformer, seeking new directions for the development in SDORSIs. Promising prospects are provided to facilitate further research in the future. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. Object Identification in Land Parcels Using a Machine Learning Approach.
- Author
-
Gundermann, Niels, Löwe, Welf, Fransson, Johan E. S., Olofsson, Erika, and Wehrenpfennig, Andreas
- Subjects
- *
MACHINE learning , *CONVOLUTIONAL neural networks , *IMAGE recognition (Computer vision) , *ARTIFICIAL intelligence , *LAND use - Abstract
This paper introduces an AI-based approach to detect human-made objects and changes in these on land parcels. To this end, we used binary image classification performed by a convolutional neural network. Binary classification requires the selection of a decision boundary, and we provided a deterministic method for this selection. Furthermore, we varied different parameters to improve the performance of our approach, leading to a true positive rate of 91.3% and a true negative rate of 63.0%. A specific application of our work supports the administration of agricultural land parcels eligible for subsidiaries. As a result of our findings, authorities could reduce the effort involved in the detection of human made changes by approximately 50%. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.