96 results on '"Attention mechanisms"'
Search Results
2. The novel graph transformer-based surrogate model for learning physical systems
- Author
-
Feng, Bo and Zhou, Xiao-Ping
- Published
- 2024
- Full Text
- View/download PDF
3. Emotion Contextual Fusion Network: a Simple yet Versatile Approach for Emotion Recognition in Textual Conversations
- Author
-
Luca, Nicoleta, Gifu, Daniela, and Trandabat, Diana
- Published
- 2024
- Full Text
- View/download PDF
4. Fine-tuned SegFormer for enhanced fetal head segmentation.
- Author
-
Joudi, Niama Assia El, Lazaar, Mohamed, Delmotte, François, Allaoui, Hamid, and Mahboub, Oussama
- Subjects
TRANSFORMER models ,CONVOLUTIONAL neural networks ,COMPUTER vision ,COMPUTER engineering ,COMPUTER-assisted image analysis (Medicine) ,DEEP learning - Abstract
Several challenges in computer vision prompted the research community to propose innovative approaches and unravel new perspectives to optimize deep learning models, thus enhancing the efficiency of vision tasks. Due to its growing applications and promising results in the NLP domain, Transformers models have inspired researchers to adapt this technology to computer vision problems by introducing the Vision Transformers networks. For this purpose, this paper provides a detailed comparative study of the properties of internal representations of Vision Transformers and Convolutional Neural Networks and presents some recent medical hybrid applications. On the other hand, the segmentation task poses various challenges due to the diversity of object position and size. Hence, development of new approaches and techniques is required. In this vein, we adapted the SegFormer model in order to segment the fetal head circumference. Our hybrid model provides a competitive and notable result with a Dice Coefficient of 93.54%, relying only on a small training dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Enhancing abusive language detection: A domain-adapted approach leveraging BERT pre-training tasks.
- Author
-
Jarquín-Vásquez, Horacio, Escalante, Hugo Jair, and Montes-y-Gómez, Manuel
- Subjects
- *
INVECTIVE , *LANGUAGE models , *OCCUPATIONAL retraining , *POPULARITY , *DEFAULT (Finance) , *DEEP learning - Abstract
The widespread adoption of deep learning approaches in natural language processing is largely attributed to their exceptional performance across diverse tasks. Notably, Transformer-based models, such as BERT, have gained popularity for their remarkable efficacy and their ease of adaptation (via fine-tuning) across various domains. Despite their success, fine-tuning these models for informal language, particularly instances involving offensive expressions, presents a major challenge due to limitations in vocabulary coverage and contextual information for such tasks. To address these challenges, we propose the domain adaptation of the BERT language model for the task of detecting abusive language. Our approach involves constraining the language model with the adaptation and paradigm shift of two default pre-trained tasks, the design of two datasets specifically engineered to support the adapted pre-training tasks, and the proposal of a dynamic weighting loss function. The evaluation of these adapted configurations on six datasets dedicated to abusive language detection reveals promising outcomes, with a significant enhancement observed compared to the base model. Furthermore, our proposed methods yield competitive results when compared to state-of-the-art approaches, establishing a robust and easily trainable model for the effective identification of abusive language. • The use of tailored pre-training tasks enhances the detection of abusive language. • Carefully designed pre-training tasks can reduce the large volumes of training data. • The joint retraining of the pre-training tasks competes with contemporary benchmarks. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Design of Translation Error Correction System Based on Improved Seq2Seq.
- Author
-
Chen, Ting
- Subjects
MACHINE translating ,LANGUAGE research ,SPEECH ,CHINESE language ,COMPUTATIONAL linguistics ,ERROR correction (Information theory) - Abstract
Speech translation technology is an important booster for promoting social communication and advancing human civilization. With the solid advancement of theories and technologies such as speech processing and machine translation, as well as the continuous deepening and development of computer science, the computing power and storage capacity have been further improved. Speech translation systems widely used in English, French, English and Chinese have successively reached a commercial level. However, the development of speech translation systems is limited by corpus resources and a lack of bilingual language research. The experiments and applications of speech translation in some languages are still in its early stages. In addition, existing research mostly adopts a cascading voice translation system as the basis, breaking through the key issues individually, and less adopts direct voice translation methods without relying on intermediate text representation, which brings problems such as cumbersome research content, complex translation models, and long translation delays. This article mainly focuses on the design of a translation error correction system based on improved Seq2Seq. Firstly, relevant research is summarized, relevant application methods are proposed, and the results are discussed. I hope to bring some inspiration to relevant researchers and provide assistance for the development of related fields. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. A systematic survey of air quality prediction based on deep learning.
- Author
-
Zhang, Zhen, Zhang, Shiqing, Chen, Caimei, and Yuan, Jiwei
- Subjects
DEEP learning ,AIR quality ,AIR pollution ,ENVIRONMENTAL sciences ,FORECASTING ,MACHINE learning - Abstract
The impact of air pollution on public health is substantial, and accurate long-term predictions of air quality are crucial for early warning systems to address this issue. Air quality prediction has drawn significant attention, bridging environmental science, statistics, and computer science. This paper presents a comprehensive review of the current research status and advances in air quality prediction methods. Deep learning, a novel machine learning approach, has demonstrated remarkable proficiency in identifying complex, nonlinear patterns in air quality data, yet its application in air quality prediction is still relatively nascent. This paper also conducts a systematic analysis and summarizes how cutting-edge deep learning models are applied in air quality prediction. Initially, the historical evolution of air quality prediction methods and datasets is presented. This is followed by an examination of conventional air quality prediction techniques. A thorough comparative analysis of progress made with both traditional and deep learning-based prediction methods is provided. This review particularly focuses on three aspects: temporal modeling, spatiotemporal modeling, and attention mechanisms. Finally, emerging trends in the field of air quality prediction are identified and discussed. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Deep learning with multi-scale temporal hybrid structure for robust crop mapping.
- Author
-
Tang, Pengfei, Chanussot, Jocelyn, Guo, Shanchuan, Zhang, Wei, Qie, Lu, Zhang, Peng, Fang, Hong, and Du, Peijun
- Subjects
- *
CONVOLUTIONAL neural networks , *DEEP learning , *HEBBIAN memory , *TIME series analysis , *TIME-varying networks , *CROPS - Abstract
Large-scale crop mapping from dense time-series images is a difficult task and becomes even more challenging with the cloud coverage. Current deep learning models frequently represent time series from a single perspective, which is insufficient to obtain fine-grained details. Meanwhile, the impact of cloud noise on deep learning models is not yet fully understood. In this study, a Multi-scale Temporal Transformer-Conv network (Ms-TTC) is proposed for robust crop mapping under frequently clouds. The Ms-TTC enhances temporal representations by effectively combining the global modeling capability of self-attention with the local capture capability of convolutional neural network (CNN) at multi-temporal scales. The Ms-TTC network consists of three main components: (1) a temporal encoder module that explores global and local temporal relationships at multi-temporal scales, (2) an attention-based fusion module that effectively fuses multi-scale temporal features, and (3) the output module that concatenates the high-level time series features and refined multi-scale features to predict the label. The proposed model demonstrated superior performance compared to state-of-the-art methods on the large-scale time series dataset, FranceCrops, achieving a minimum improvement of 2% in mF1 scores. Subsequently, gradient back-propagation-based feature importance analysis was used to investigate the behavior of deep learning models for processing time series data with cloud noise. The results revealed that most deep learning models can suppress cloudy observations to some degree, and models with a global field of view had superior cloud masking but also lost some local temporal information. Clouds can influence the model's attention towards the spectral dimension, particularly affecting the visible and vegetation red-edge bands, which exhibit higher sensitivity to cloud noise and play a crucial role to performance. This study provides a feasible approach for large-scale dynamic crop mapping independently of cloudy conditions by combining global-local temporal representations at multi-scales. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. SLAPP: Subgraph-level attention-based performance prediction for deep learning models.
- Author
-
Wang, Zhenyi, Yang, Pengfei, Hu, Linwei, Zhang, Bowen, Lin, Chengmin, Lv, Wenkai, and Wang, Quan
- Subjects
- *
DEEP learning , *NETWORK performance , *FORECASTING - Abstract
The intricacy of the Deep Learning (DL) landscape, brimming with a variety of models, applications, and platforms, poses considerable challenges for the optimal design, optimization, or selection of suitable DL models. One promising avenue to address this challenge is the development of accurate performance prediction methods. However, existing methods reveal critical limitations. Operator-level methods, proficient at predicting the performance of individual operators, often neglect broader graph features, which results in inaccuracies in full network performance predictions. On the contrary, graph-level methods excel in overall network prediction by leveraging these graph features but lack the ability to predict the performance of individual operators. To bridge these gaps, we propose SLAPP, a novel subgraph-level performance prediction method. Central to SLAPP is an innovative variant of Graph Neural Networks (GNNs) that we developed, named the Edge Aware Graph Attention Network (EAGAT). This specially designed GNN enables superior encoding of both node and edge features. Through this approach, SLAPP effectively captures both graph and operator features, thereby providing precise performance predictions for individual operators and entire networks. Moreover, we introduce a mixed loss design with dynamic weight adjustment to reconcile the predictive accuracy between individual operators and entire networks. In our experimental evaluation, SLAPP consistently outperforms traditional approaches in prediction accuracy, including the ability to handle unseen models effectively. Moreover, when compared to existing research, our method demonstrates a superior predictive performance across multiple DL models. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Ultra-Attention: Automatic Recognition of Liver Ultrasound Standard Sections Based on Visual Attention Perception Structures.
- Author
-
Zhang, Jiansong, Chen, Yongjian, Zeng, Pan, Liu, Yao, Diao, Yong, and Liu, Peizhong
- Subjects
- *
VISUAL perception , *COMPUTER-assisted image analysis (Medicine) , *ULTRASONIC imaging , *NATURAL language processing , *LIVER , *IMAGE recognition (Computer vision) , *EYE tracking - Abstract
Acquisition of a standard section is a prerequisite for ultrasound diagnosis. For a long time, there has been a lack of clear definitions of standard liver views because of physician experience. The accurate automated scanning of standard liver sections, however, remains one of ultrasonography medicine's most important issues. In this article, we enrich and expand the classification criteria of liver ultrasound standard sections from clinical practice and propose an Ultra-Attention structured perception strategy to automate the recognition of these sections. Inspired by the attention mechanism in natural language processing, the standard liver ultrasound views will participate in the global attention algorithm as modular local images in computer vision of ultrasound images, which will significantly amplify small features that would otherwise go unnoticed. In addition to using the dropout mechanism, we also use a Part-Transfer Learning training approach to fine-tune the model's rate of convergence to increase its robustness. The proposed Ultra-Attention model outperforms various traditional convolutional neural network-based techniques, achieving the best known performance in the field with a classification accuracy of 93.2%. As part of the feature extraction procedure, we also illustrate and compare the convolutional structure and the Ultra-Attention approach. This analysis provides a reasonable view for future research on local modular feature capture in ultrasound images. By developing a standard scan guideline for liver ultrasound-based illness diagnosis, this work will advance the research on automated disease diagnosis that is directed by standard sections of liver ultrasound. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
11. Semantic segmentation model based on edge information for rock structural surface traces detection.
- Author
-
Yuan, Xiaofeng, Wu, Dun, Wang, Yalin, Yang, Chunhua, Gui, Weihua, Cheng, Shuqiao, Ye, Lingjian, and Shen, Feifan
- Subjects
- *
CONVOLUTIONAL neural networks , *ENGINEERING geology , *TRANSFORMER models , *INFORMATION processing - Abstract
Fast and accurate detection of rock structural surface traces is crucial for geology and engineering fields. In recent years, deep learning techniques like U-Net (UNet) have been applied to rock structural surface traces detection by virtue of its high accuracy and strong robustness. However, the loss of important information during the downsampling process may hinder the model performance for rock structural surface traces detection. To alleviate this problem, this paper proposes a semantic segmentation model based on edge information (Edge-UNet) for rock structural surface traces detection. In Edge-UNet, an edge pooling method is designed, which can retain more trace features rich in edge information in the downsampling process, so as to enhance the learning of the model for traces. Then, an edge semantic enhancement structure based on edge pooling is designed to strengthen the edge information in Edge-UNet's encoder. In addition, a channel space attention gate based on edge information is incorporated in Edge-UNet's decoder, which facilitates the model to capture fine trace features. These designs clarify the retention and utilization of edge information in principle which enhances the interpretability of the model. Finally, Convolutional neural network -based and Transformer-based semantic segmentation models were selected for comparison experiments with Edge-UNet, respectively. From the experimental results, Edge-UNet outperforms the other models in three performance metrics, which verifies the superior performance of Edge-UNet in rock structural surface trace detection task. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
12. ASK-HAR: Attention-Based Multi-Core Selective Kernel Convolution Network for Human Activity Recognition.
- Author
-
Yu, Xugao and Al-qaness, Mohammed A.A.
- Subjects
- *
HUMAN activity recognition , *DEEP learning , *CONVOLUTIONAL neural networks , *SENSOR networks , *FEATURE extraction - Abstract
Human Activity Recognition (HAR) is an increasingly popular field of study aimed at automatically identifying and categorizing human movements and activities using various tracking devices and measurements, such as sensors and cameras. Smartphones have emerged as a prevalent sensor modality for HAR, offering abundant data on an individual's movements through GPS, accelerometers, and gyroscopes. In conventional convolutional neural networks (CNNs), the artificial neurons within each feature layer typically possess identical receptive fields (RFs). This paper introduces a novel model, ASK-HAR (A ttention-based Multi-Core S elective K ernel Convolution Network for HAR), which enhances HAR by performing kernel selection between multiple branches with different RFs using attention mechanisms. The selective kernel mechanism is leveraged to optimize HAR performance. Additionally, the CBAM attention module is employed for time series feature extraction and activity recognition within the overall framework. Extensive experiments conducted on benchmark HAR datasets, including UCI-HAR, USC-HAD, WISDM, PAMAP2, and DSADS, demonstrate that the ASK-HAR model consistently achieves high accuracy across all datasets. • Propose a novel sensor-based deep learning model called ASK-HAR for activity recognition. • ASK-HAR performs kernel selection between multiple branches with different RFs. • We present the CBMA module in an effort to enhance system performance even further. • CBMA combines spatial attention and channel attention to learn and capture high-level time series features. • Evaluate ASK-HAR with five public HAR datasets with extensive comparisons to existing models. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
13. Dictionary trained attention constrained low rank and sparse autoencoder for hyperspectral anomaly detection.
- Author
-
Hu, Xing, Li, Zhixuan, Luo, Lingkun, Karimi, Hamid Reza, and Zhang, Dawei
- Subjects
- *
AUTOENCODER , *ENCYCLOPEDIAS & dictionaries , *DETECTORS , *DEEP learning - Abstract
• Proposing an attention constrained low-rank and sparse autoencoder for hyperspectral anomaly detection. • Designing two detectors, AClrAE and ACsAE, to focus more on global background reconstruction and anomaly reconstruction. • Conducting experiments on real and synthetic datasets demonstrates the effectiveness and superiority of the proposed detectors. Dictionary representations and deep learning Autoencoder (AE) models have proven effective in hyperspectral anomaly detection. Dictionary representations offer self-explanation but struggle with complex scenarios. Conversely, autoencoders can capture details in complex scenes but lack self-explanation. Complex scenarios often involve extensive spatial information, making its utilization crucial in hyperspectral anomaly detection. To effectively combine the advantages of both methods and address the insufficient use of spatial information, we propose an attention constrained low-rank and sparse autoencoder for hyperspectral anomaly detection. This model includes two encoders: an attention constrained low-rank autoencoder (AClrAE) trained with a background dictionary and incorporating a Global Self-Attention Module (GAM) to focus on global spatial information, resulting in improved background reconstruction; and an attention constrained sparse autoencoder (ACsAE) trained with an anomaly dictionary and incorporating a Local Self-Attention Module (LAM) to focus on local spatial information, resulting in enhanced anomaly reconstruction. Finally, to merge the detection results from both encoders, a nonlinear fusion scheme is employed. Experiments on multiple real and synthetic datasets demonstrate the effectiveness and feasibility of the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
14. Combining residual convolutional LSTM with attention mechanisms for spatiotemporal forest cover prediction.
- Author
-
Liu, Bao, Chen, Siqi, and Gao, Lei
- Subjects
- *
CONVOLUTIONAL neural networks , *FOREST management , *STANDARD deviations , *FOREST dynamics , *PREDICTION models , *DEEP learning - Abstract
Understanding spatiotemporal variations in forest cover is crucial for effective forest resource management. However, existing models often lack accuracy in simultaneously capturing temporal continuity and spatial correlation. To address this challenge, we developed ResConvLSTM-Att, a novel hybrid model integrating residual neural networks, Convolutional Long Short-Term Memory (ConvLSTM) networks, and attention mechanisms. We evaluated ResConvLSTM-Att against four deep learning models: LSTM, combined convolutional neural network and LSTM (CNN-LSTM), ConvLSTM, and ResConvLSTM for spatiotemporal prediction of forest cover in Tasmania, Australia. ResConvLSTM-Att achieved outstanding prediction performance, with an average root mean square error (RMSE) of 6.9% coverage and an impressive average coefficient of determination of 0.965. Compared with LSTM, CNN-LSTM, ConvLSTM, and ResConvLSTM, ResConvLSTM-Att achieved RMSE reductions of 31.2%, 43.0%, 10.1%, and 6.5%, respectively. Additionally, we quantified the impacts of explanatory variables on forest cover dynamics. Our work demonstrated the effectiveness of ResConvLSTM-Att in spatiotemporal data modelling and prediction. • Developed ResConvLSTM-Att model for spatiotemporal forest cover prediction. • ResConvLSTM-Att combined ResNet, ConvLSTM, and dual attention mechanisms. • ResConvLSTM-Att improved long-term temporal dependency and spatial feature capture. • ResConvLSTM-Att outperformed four other deep learning models in prediction accuracy. • Identified key temporal and spatial variables impacting forest cover dynamics. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
15. Wheat growth stage identification method based on multimodal data.
- Author
-
Li, Yong, Che, Yinchao, Zhang, Handan, Zhang, Shiyu, Zheng, Liang, Ma, Xinming, Xi, Lei, and Xiong, Shuping
- Subjects
- *
ARTIFICIAL neural networks , *CONVOLUTIONAL neural networks , *DEEP learning , *CROP growth , *MULTISENSOR data fusion , *WINTER wheat - Abstract
Accurate identification of crop growth stages is a crucial basis for implementing effective cultivation management. With the development of deep learning techniques in image understanding, research on intelligent real-time recognition of crop growth stages based on RGB images has garnered significant attention. However, the small differences and high similarity in crop morphological characteristics during the transition between adjacent growth stages pose challenges for accurate identification. To address this issue, this study proposes a multi-scale convolutional neural network model, termed MultiScalNet-Wheat (MSN-W), which enhances the algorithm's ability to learn complex features by utilizing multi-scale convolution and attention mechanisms. This model extracts key information from redundant data to identify winter wheat growth stages in complex field environments. Experimental results show that the MSN-W model achieves a recognition accuracy of 97.6 %, outperforming typical convolutional neural network models such as VGG19, ResNet50, MobileNetV3, and DenseNet. To further address the difficulty in recognizing growth stages during transition periods, where canopy morphological features are highly similar and show small differences, this paper introduces an innovative approach by incorporating sequential environmental data related to wheat growth stages. By extracting these features and performing multi-modal collaborative inference, a multi-modal feature-based wheat growth stage recognition model, termed MultiModalNet-Wheat (MMN-W), is constructed on the basis of the MSN-W model. Experimental results indicate that the MMN-W model achieves a recognition accuracy of 98.5 %, improving by 0.9 % over the MSN-W model. Both the MSN-W and MMN-W models provide accurate methods for observing wheat growth stages, thereby supporting the scientific management of winter wheat at different growth stages. [Display omitted] • It addresses automatically recognizing the growth stages of winter wheat. • Proposed MSN-W to enhance network's capability and extraction redundant data. • Proposed MMN-W to further enhance accuracy of recognizing adjacent growth stages. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
16. TUMbRAIN: A transformer with a unified mobile residual attention inverted network for diagnosing brain tumors from magnetic resonance scans.
- Author
-
Montalbo, Francis Jesmar P.
- Subjects
- *
ARTIFICIAL neural networks , *CONVOLUTIONAL neural networks , *TRANSFORMER models , *MACHINE learning , *ARTIFICIAL intelligence , *DEEP learning - Abstract
Diagnosing tumors in Magnetic Resonance Imaging (MRI) brain scans is challenging and can lead to errors, even for radiologists. Deep learning, mainly through deep convolutional neural networks, has assisted in automating the diagnosis of these scans. However, there is still room for improvement. Researchers have shown that transformer models hold promise but often remain underutilized due to their need for large amounts of data and complexity compared to traditional neural networks. This paper introduces a new hybrid model called TUMbRAIN (Transformer with a Unified Mobile Residual Attention Inverted Network), which combines a lightweight transformer with a deep convolutional neural network to address these issues. TUMbRAIN incorporates innovative components designed for this purpose, such as the expanded inverted residual block and the unified self-attention mechanism. The results demonstrate that TUMbRAIN outperforms many existing state-of-the-art neural network models, achieving an impressive overall accuracy of 97.94 % with only 1.04 million parameters. These results suggest that hybrid transformer models like TUMbRAIN could significantly advance the automated diagnosis of brain tumors from MRI scans. The study also offers new insights into effectively integrating transformers into traditional neural network architectures, resulting in a cost-effective and accurate deep learning solution for medical imaging. By incorporating these advanced components, TUMbRAIN enhances support for radiological practice through improved diagnostic accuracy and efficiency. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
17. AFC-Unet: Attention-fused full-scale CNN-transformer unet for medical image segmentation.
- Author
-
Meng, Wenjie, Liu, Shujun, and Wang, Huajun
- Subjects
IMAGE segmentation ,TRANSFORMER models ,DIAGNOSTIC imaging ,PYRAMIDS ,GENERALIZATION ,MARKOV random fields - Abstract
In the field of medical image segmentation, although U-Net has achieved significant achievements, it still exposes some inherent disadvantages when dealing with complex anatomical structures and small targets, such as inaccurate target localization, blurry edges, and insufficient integration of contextual information. To address these challenges, this study proposes the Attention-Fused Full-Scale CNN-Transformer Unet (AFC-Unet), aiming to effectively overcome the limitations of traditional U-Net through multi-scale feature fusion, attention mechanisms, and CNN-Transformer hybrid modules. Specifically, we adopt an encoder–decoder architecture, incorporating full-scale feature block fusion and pyramid sampling modules to enhance the model's ability to recognize fine to overall structural features by integrating cross-level multi-scale features. We propose the Multi-feature Fusion Attention Gates (MFAG) module, which focuses on and highlights discriminative information of potential lesions and key anatomical boundaries, effectively suppressing irrelevant background interference. We design a module Convolutional Hybrid Attention Transformer (CHAT) that integrates CNN and Transformer to address the shortcomings of traditional single models in handling long-range dependencies and global context understanding. Experimental results on three datasets of different scales demonstrate that the model's segmentation performance for medical images surpasses state-of-the-art models, showcasing high generalization ability. • Introduces AFC-Unet for enhanced medical image segmentation accuracy. • Uses full-scale feature blocks and pyramid sampling to improve edge clarity. • Implements MFAG to highlight key structures and reduce background noise. • CHAT module improves long-range dependency handling and global context. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
18. A two-stage spatial prediction modeling approach based on graph neural networks and neural processes.
- Author
-
Bao, Li-Li, Zhang, Chun-Xia, Zhang, Jiang-She, and Guo, Rui
- Subjects
- *
GRAPH neural networks , *DEEP learning , *GAUSSIAN processes , *PREDICTION models , *ENVIRONMENTAL sciences - Abstract
Spatial prediction models hold significant application value in fields such as environmental science, economic development, and geological exploration. With advancements in deep learning technology, graph neural networks (GNNs) offer a powerful and scalable solution for spatial data modeling. In these models, each spatial location is represented as a vertex, with the explanatory variables at each point converted into vertex features, and the vertex label corresponding to the target value at that point. GNNs utilize this representation to predict vertex labels, thereby performing spatial prediction. However, the residuals of GNN predictions are found to still exhibit spatial autocorrelation, indicating that traditional GNNs only achieve suboptimal results in terms of capturing complex spatial relationships. In this paper, we propose a two-stage spatial prediction method called Location Embedded Graph Neural Networks-Residual Neural Processes (LEGNN-RNP) to address this challenge. In the first stage, we employ LEGNNs, a new framework that integrates location features into GNNs through an attention mechanism, enhancing the learning of complex spatial models by considering both spatial context and attribute features. In the second stage, we model the residuals from LEGNN predictions to further extract spatial patterns from the data. Specifically, we introduce RNP, a neural process-based approach to model the Gaussian Process (GP) distribution of residuals. The distribution parameters (i.e., mean and covariance) are parameterized by a neural network, where the mean estimates the residual and the variance quantifies the prediction uncertainty. Experiments on four datasets demonstrate that our proposed two-stage method achieves state-of-the-art results by effectively extracting spatial relationships and significantly improving spatial prediction accuracy. • Residuals of GNN regression still exhibit spatial autocorrelation. • Location Embedded GNN can learn complex spatial correlation better. • Neural embedding-based Gaussian is used to predict residuals processed by GNN. • Proposed method performs the best in spatial prediction task. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. ResMT: A hybrid CNN-transformer framework for glioma grading with 3D MRI.
- Author
-
Cui, Honghao, Ruan, Zhuoying, Xu, Zhijian, Luo, Xiao, Dai, Jian, and Geng, Daoying
- Subjects
- *
MAGNETIC resonance imaging , *DEEP learning , *TRANSFORMER models , *GLIOMAS , *MODEL airplanes - Abstract
Accurate grading of gliomas is crucial for treatment strategies and prognosis. While convolutional neural networks (CNNs) have proven effective in classifying medical images, they struggle with capturing long-range dependencies among pixels. Transformer-based networks can address this issue, but CNN-based methods often perform better when trained on small datasets. Additionally, tumor segmentation is essential for classification models, but training an additional segmentation model significantly increases workload. To address these challenges, we propose ResMT, which combines CNN and transformer architectures for glioma grading, extracting both local and global features efficiently. Specifically, we designed a spatial residual module (SRM) where a 3D CNN captures glioma's volumetric complexity, and Swin UNETR, a pre-trained segmentation model, enhances the network without extra training. Our model also includes a multi-plane channel and spatial attention module (MCSA) to refine the analysis by focusing on critical features across multiple planes (axial, coronal, and sagittal). Transformer blocks establish long-range relationships among planes and slices. We evaluated ResMT on the BraTs19 dataset, comparing it with baselines and state-of-the-art models. Results demonstrate that ResMT achieves the highest prediction performance with an AUC of 0.9953, highlighting hybrid CNN-transformer models' potential for 3D MRI classification. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. Air pollutant prediction based on a attention mechanism model of the Yangtze River Delta region in frequent heatwaves.
- Author
-
Liu, Bingchun, Lai, Mingzhao, Zeng, Peng, and Chen, Jiali
- Subjects
- *
AIR pollutants , *URBAN pollution , *CLOUDINESS , *AIR pollution , *WEATHER - Abstract
Heatwaves pose significant threats to urban environments, affecting both ecological systems and public health, primarily through the exacerbation of air pollution. Accurate prediction of air pollutant concentrations during heatwave periods is crucial for authorities to develop timely prevention and control strategies. Thus, we developed the 1D-CNN-BiLSTM-attention model, specifically designed to account for the unique data characteristics associated with heatwave conditions. Our model leverages an attention mechanism to enhance its ability to learn and predict air pollutant behavior during heatwaves. Across six scenario-based experiments, the model demonstrated high predictive accuracy, achieving a MAPE of 2.93 %. The model integrates meteorological indicators such as temperature, humidity, wind speed, cloud cover, and precipitation, extending its predictive capability across a spatial range of 150 km. In experiments testing the model's applicability to three typical city types in the Yangtze River Delta region, the results confirmed its effectiveness in predicting air pollutants. These findings highlight the model's usefulness for studying air pollution during urban heatwave periods on a regional scale, demonstrating its robustness and reliability under varying weather conditions. • Developed an 1D-CNN-BiLSTM-attention model that integrates feature learning on both temporal and spatial aspects of the data. • The attention mechanism enhances the adaptability of urban air pollution models to handle specific weather conditions. • Discussed the forecast performance under the combination of input weather and the difference of input range is discussed. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. A lightweight Transformer-based visual question answering network with Weight-Sharing Hybrid Attention.
- Author
-
Zhu, Yue, Chen, Dongyue, Jia, Tong, and Deng, Shizhuo
- Subjects
- *
LANGUAGE models , *TRANSFORMER models , *VISUAL learning , *PROBLEM solving - Abstract
Recent advances show that Transformer-based models and object detection-based models play an indispensable role in VQA. However, object detection-based models have significant limitations due to their redundant and complex detection box generation process. In contrast, Visual and Language Pre-training (VLP) models can achieve better performance, but require high computing power. To this end, we present Weight-Sharing Hybrid Attention Network (WHAN), a lightweight Transformer-based VQA model. In WHAN, we replace the object detection network with Transformer encoder and use LoRA to solve the problem that the language model cannot adapt to interrogative sentences. We propose Weight-Sharing Hybrid Attention (WHA) module with parallel residual adapters, which can significantly reduce the trainable parameters of the model and we design DWA and BVA modules that can allow the model to perform attention operations from different scales. Experiments on VQA-v2, COCO-QA, GQA, and CLEVR datasets show that WHAN achieves competitive performance with far fewer trainable parameters. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. S3PaR: Section-based Sequential Scientific Paper Recommendation for paper writing assistance.
- Author
-
Santosa, Natasha Christabelle, Liu, Xin, Han, Hyoil, and Miyazaki, Jun
- Subjects
- *
GRAPH neural networks , *RECOMMENDER systems , *RESEARCH personnel , *EXPERTISE - Abstract
A scientific paper recommender system (RS) is very helpful for literature searching in that it (1) helps novice researchers explore their own field and (2) helps experienced researchers explore new fields outside their area of expertise. However, existing RSs usually recommend relevant papers based on users' static interests, i.e., papers they cited in their past publication(s) or reading histories. In this paper, we propose a novel recommendation task based on users' dynamic interests during their paper-writing activity. This dynamism is revealed in (for example) the topic shift while writing the Introduction vs. Related Works section. In solving this task, we developed a new pipeline called " S ection-based S equential S cientific Pa per R ecommendation (S3PaR)", which recommends papers based on the context of the given user's currently written paper section. Our experiments demonstrate that this unique task and our proposed pipeline outperform existing standard RS baselines. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
23. A coordinated active and reactive power optimization approach for multi-microgrids connected to distribution networks with multi-actor-attention-critic deep reinforcement learning.
- Author
-
Dong, Lei, Lin, Hao, Qiao, Ji, Zhang, Tao, Zhang, Shiming, and Pu, Tianjiao
- Subjects
- *
DEEP reinforcement learning , *REINFORCEMENT learning , *DATA privacy , *REACTIVE power , *ELECTRICAL load , *MICROGRIDS , *POWER distribution networks - Abstract
As a promising approach to managing distributed energy, the use of microgrids has attracted significant attention among those managing continuous connections to distribution networks. However, the barriers of the data sharing among different microgrids, the uncertainty of the distributed renewable sources and loads, and the nonlinear optimization of power flow make traditional model-based optimization methods difficult to be applied. In this paper, a data-driven coordinated active and reactive power optimization method is proposed for distribution networks with multi-microgrids. A multi-agent deep reinforcement learning (MADRL) method is used to protect the data privacy of each microgrids. Moreover, attention mechanism, which pays attention to crucial information, is presented to overcome the problem of slow convergence caused by the dimensionality explosion of the optimized variables. Two types of agents, controlling discrete action and continuous action devices, respectively, are formulated in coordinated optimization, which reduces voltage violations and improves the system operation efficiency. In addition, in order to improve the performance of the online agent model under variable operation conditions, the transfer learning is embedded in the training process of the MADRL. The proposed method is verified on a modified IEEE 33-bus distribution network with nine microgrids. • MAAC is proposed to accelerate the training process and alleviate the high-dimensional non-convex problem for microgrids. • Transfer learning is combined with MAAC to further accelerate the multi-agent convergence process. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. A hyperspectral metal concentration inversion method using attention mechanism and graph neural network.
- Author
-
Zhang, Lei
- Subjects
GRAPH neural networks ,HEAVY metal toxicology ,CONVOLUTIONAL neural networks ,DEEP learning ,AGRICULTURE ,PARTIAL least squares regression ,HEAVY metals - Abstract
Soil heavy metal contamination has emerged as a global environmental concern, posing significant risks to human health and ecosystem integrity. Hyperspectral technology, with its non-invasive, non-destructive, large-scale, and high spectral resolution capabilities, shows promising applications in monitoring soil heavy metal pollution. Traditional monitoring methods are often time-consuming, labor-intensive, and expensive, limiting their effectiveness for rapid, large-scale assessments. This study introduces a novel deep learning method, SpecMet, for estimating heavy metal concentrations in naturally contaminated agricultural soils using hyperspectral data. The SpecMet model extracts features from hyperspectral data using convolutional neural networks (CNNs) and achieves end-to-end prediction of soil heavy metal concentrations by integrating attention mechanisms and graph neural networks. Results demonstrate that the OR-SpecMet model, which utilizes raw spectral data, achieves optimal prediction performance, significantly surpassing traditional machine learning methods such as multiple linear regression, partial least squares regression, and support vector machine regression in estimating concentrations of lead (Pb), copper (Cu), cadmium (Cd), and mercury (Hg). Moreover, training specialized OR-SpecMet models for individual heavy metals better accommodates their unique spectral-concentration relationships, enhancing overall estimation accuracy while achieving a 20.3 % improvement in predicting low-concentration mercury. The OR-SpecMet method showcases the superior performance and extensive application potential of deep learning techniques in precise soil heavy metal pollution monitoring, offering new insights and reliable technical support for soil pollution prevention and agricultural ecosystem protection. The code and datasets used in this study are publicly available at: https://github.com/zhang2lei/metal.git. • Using attention mechanism and graph neural network for metal concentration inversion. • OR-SpecMet outperformed traditional methods in retrieving Pb, Cu, Cd, and Hg concentrations. • Metal-specific OR-SpecMet boosted prediction accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. A User Segmentation Method in Heterogeneous Open Innovation Communities Based on Multilayer Information Fusion and Attention Mechanisms.
- Author
-
Daradkeh, Mohammad
- Subjects
- *
COMMUNITIES , *OPEN innovation , *K-means clustering , *VIRTUAL communities , *DATA analytics , *BUSINESS analytics , *VIRTUAL networks - Abstract
The heterogeneity and diversity of users and external knowledge resources is a hallmark of open innovation communities (OICs). Although user segmentation in heterogeneous OICs is a prominent and recurring issue, it has received limited attention in open innovation research and practice. Most existing user segmentation methods ignore the heterogeneity and embedded relationships that link users to communities through various items, resulting in limited accuracy of user segmentation. In this study, we propose a user segmentation method in heterogeneous OICs based on multilayer information fusion and attention mechanisms. Our method stratifies the OIC and creates user node embeddings based on different relationship types. Node embeddings from different layers are then merged to form a global representation of user fusion embeddings based on a semantic attention mechanism. The embedding learning of nodes is optimized using a multi-objective optimized node representation based on the Deep Graph Infomax (DGI) algorithm. Finally, the k-means algorithm is used to form clusters of users and partition them into distinct segments based on shared features. Experiments conducted on datasets collected from four OICs of business intelligence and analytics software show that our method outperforms multiple baseline methods based on unsupervised and supervised graph embeddings. This study provides methodological guidance for user segmentation based on structured community data and semantic social relations and provides insights for its practice in heterogeneous OICs. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
26. Information agenda as an analogue of attention in sociomorphic neuronal networks.
- Author
-
Andreyuk, Denis, Petrunin, Yuri, Shuranova, Ann, and Ushakov, Vadim
- Subjects
NEURAL circuitry ,SOCIAL groups ,BIOLOGICAL systems ,INFORMATION modeling ,ARTIFICIAL neural networks - Abstract
Information processing and decision-making performed by a social group can be modelled as a neuronal network activity. Sociomorphic neuronal networks (SNN) are in several ways different from neuromorphic networks; however, the majority of their critical features are similar. This paper describes a mechanism of SNN which functions in the same way as attention does in biological intelligence systems. News agenda serves as an initiating factor for this mechanism. Structured news items are analysed within a group in terms of their relation to the group' structure of values. While current values can differ slightly between the group members, core values are constant for the whole group. The suggested approach provides a basis for developing new tools for modelling information processing in social groups as neuronal network contours. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
27. A semi-supervised learning approach for COVID-19 detection from chest CT scans.
- Author
-
Zhang, Yong, Su, Li, Liu, Zhenxing, Tan, Wei, Jiang, Yinuo, and Cheng, Cheng
- Subjects
- *
SUPERVISED learning , *COMPUTED tomography , *RECEIVER operating characteristic curves , *COVID-19 , *COVID-19 pandemic , *CONVOLUTIONAL neural networks - Abstract
• An semi-supervised learning method is proposed to improve diagnostic efficiency of covid-19 CT images. • Based on MixMatch, new data enhancement methods and training methods are introduced to reduce the risk of model over fitting. • Data enhancement method with MixMatch is introduced to reduce model over fitting. • CNN with attention mechanism is constructed to learn features from images more effectively. COVID-19 has spread rapidly all over the world and has infected more than 200 countries and regions. Early screening of suspected infected patients is essential for preventing and combating COVID-19. Computed Tomography (CT) is a fast and efficient tool which can quickly provide chest scan results. To reduce the burden on doctors of reading CTs, in this article, a high precision diagnosis algorithm of COVID-19 from chest CTs is designed for intelligent diagnosis. A semi-supervised learning approach is developed to solve the problem when only small amount of labelled data is available. While following the MixMatch rules to conduct sophisticated data augmentation, we introduce a model training technique to reduce the risk of model over-fitting. At the same time, a new data enhancement method is proposed to modify the regularization term in MixMatch. To further enhance the generalization of the model, a convolutional neural network based on an attention mechanism is then developed that enables to extract multi-scale features on CT scans. The proposed algorithm is evaluated on an independent CT dataset of the chest from COVID-19 and achieves the area under the receiver operating characteristic curve (AUC) value of 0.932, accuracy of 90.1%, sensitivity of 91.4%, specificity of 88.9%, and F1-score of 89.9%. The results show that the proposed algorithm can accurately diagnose whether a chest CT belongs to a positive or negative indication of COVID-19, and can help doctors to diagnose rapidly in the early stages of a COVID-19 outbreak. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
28. Memory degradation induced by attention in recurrent neural architectures.
- Author
-
Harvat, Mykola and Martín-Guerrero, José D.
- Subjects
- *
PARALLEL processing , *MACHINE learning , *MEMORY , *KNOWLEDGE transfer , *MACHINE translating - Abstract
• The work proposes an empirical analysis of memory degradation in RNN networks. • Attention based architectures tend to not use the RNN memory. • Direct input attention allows the RNN to work without attention interference. • The conjectures were tested on eight different problems. This paper studies the memory mechanisms in recurrent neural architectures when attention models are included. Pure-attention models like Transformers are more and more popular as they tend to outperform models with recurrent connections in many different tasks. Our conjecture is that attention prevents the recurrent connections from transferring information properly between consecutive next steps. This conjecture is empirically tested using five different models, namely, a model without attention, a standard Loung attention model, a standard Bahdanau attention model, and our proposal to add attention to the inputs in order to fill the gap between recurrent and parallel architectures (for both Luong and Bahdanau attention models). Eight different problems are considered to assess the five models: a sequence-reverse copy problem, a sequence-reverse copy problem with repetitions, a filter sequence problem, a sequence-reverse copy problem with bigrams and four translation problems (English to Spanish, English to French, English to German and English to Italian). The achieved results reinforce our conjecture on the interaction between attention and recurrence. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
29. Improving performance of deep learning models for 3D point cloud semantic segmentation via attention mechanisms.
- Author
-
Vanian, Vazgen, Zamanakos, Georgios, and Pratikakis, Ioannis
- Subjects
- *
DEEP learning , *POINT cloud , *ARTIFICIAL neural networks , *COMPUTER vision , *AUTONOMOUS vehicles - Abstract
3D Semantic segmentation is a key element for a variety of applications in robotics and autonomous vehicles. For such applications, 3D data are usually acquired by LiDAR sensors resulting in a point cloud, which is a set of points characterized by its unstructured form and inherent sparsity. For the task of 3D semantic segmentation where the corresponding point clouds should be labeled with semantics, the current tendency is the use of deep learning neural network architectures for effective representation learning. On the other hand, various 2D and 3D computer vision tasks have used attention mechanisms which result in an effective re-weighting of the already learned features. In this work, we aim to investigate the role of attention mechanisms for the task of 3D semantic segmentation for autonomous driving, by identifying the significance of different attention mechanisms when adopted in existing deep learning networks. Our study is further supported by an extensive experimentation on two standard datasets for autonomous driving, namely Street3D and SemanticKITTI, that permit to draw conclusions at both a quantitative and qualitative level. Our experimental findings show that there is a clear advantage when attention mechanisms have been adopted, resulting in a superior performance. In particular, we show that the adoption of a Point Transformer in a SPVCNN network, results in an architecture which outperforms the state of the art on the Street3D dataset. [Display omitted] • Improving DL models for 3D point cloud semantic segmentation via attention mechanisms. • Evaluation study of attention modules for 3D semantic segmentation in LiDAR data. • Development and implementation of attention enhanced networks. • Evaluation of attention mechanisms in two datasets for autonomous driving. • Pros and cons of attention mechanisms, given their performance in the two datasets. • Fruitful discussion aiming to identify key features to be either adopted or avoided. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
30. R-AFNIO: Redundant IMU fusion with attention mechanism for neural inertial odometry.
- Author
-
Yang, Bing, Wang, Xuan, Huang, Fengrong, Cao, Xiaoxiang, and Zhang, Zhenghua
- Subjects
- *
INERTIAL navigation systems , *LANGUAGE models , *AUTOMOTIVE electronics , *SIGNAL processing , *PRIOR learning , *DEEP learning - Abstract
Inertial Navigation Systems (INS) play an increasing role in automotive electronics and aerospace applications, particularly in autonomous vehicles, due to their low computational load, swift response, and high autonomy. However, the substantial error accumulation poses a significant challenge for an INS, especially when employing the low-cost Inertial Measurement Units (IMUs). This study proposed R-AFNIO, a convolutional and attention-based deep learning network, which is developed to decrease the issues of error accumulation and the fusion of IMU array data. Firstly, we introduce a self-supervised learning model to learn the prior knowledge from IMU observation by masking redundant IMU data and reconstructing it, thereby reducing the noise of IMU data. Furthermore, we present an intelligent framework and employ an attention-based soft-weighting algorithm to mine the latent information within redundant IMUs. This approach effectively enhances fusion precision and strengthens robustness against error observations. Notably, it is the first approach that utilizes deep learning to solve the information fusion problem of redundant IMUs (IMU arrays). Lastly, we propose a state-augmented tight integration algorithm to improve the local accuracy and robustness of the navigation system. We comprehensively validate the proposed R-AFNIO using both a publicly available dataset and a dataset collected by our team. Experiment results demonstrated that R-AFNIO performs accurate and robust results on most indicators. Compared to several current studies, the absolute trajectory error shows reductions ranging from 20.2% to 97.7%, while the relative trajectory error exhibited reductions ranging from 18.5% to 98.7%. The ablation experiment further highlights the potency of R-AFNIO's self-supervised and redundant weighting modules. • First use of deep learning in Redundant Inertial Navigation System. • Upgraded BERT model to reduce noise in Redundant Inertial Measurement Units (RIMU). • Combined self-supervised and supervised learning for better RIMU fusion. • Integrated model-data fusion framework for robust state estimation. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
31. Unsupervised attention-guided domain adaptation model for Acute Lymphocytic Leukemia (ALL) diagnosis.
- Author
-
Baydilli, Yusuf Yargı
- Subjects
LYMPHOBLASTIC leukemia ,GENERATIVE adversarial networks ,DEEP learning ,MARGINAL distributions ,BLOOD cells - Abstract
Acute lymphocytic leukemia (ALL) is a dangerous disease characterized by an increased number of abnormal blood cells in the blood. Its early diagnosis and treatment are crucial, as it can lead to severe consequences if left untreated. Manual examination of blood samples by pathologists and/or hematologists is time-consuming and requires expert skill, so automated and fast solutions need to be developed. However, the marginal data distribution of samples taken from subjects under certain conditions is a major obstacle to building a model that works on datasets obtained under different conditions. Labeling the new dataset also means extra costs. Considering these reasons, this study proposes an attention-enhanced generative adversarial network (GAN) model to move two datasets with different structures into the same feature space. The proposed model is used to transfer the blast and normal cells to the target domain regardless of the background to eliminate the domain difference between the datasets. By learning the complex structure and class features of the cells in an unsupervised manner, the labeling cost is eliminated and it is shown that the trained classifier achieves better results than other domain adaptation methods in the literature. At the end of the study, it was seen that attention mechanisms are highly skilled in extracting the useful parts from the data. In this way, domain mismatch between datasets could be eliminated. • This study aims to address the domain shift problem between two Acute lymphocytic leukemia (ALL) datasets. • An attention mechanism guided domain adaptation method is proposed. • The model performs feature transfer between segmented and unsegmented cells. • The proposed model outperformed the SOTA models. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
32. End-to-end time-dependent probabilistic assessment of landslide hazards using hybrid deep learning simulator.
- Author
-
Huang, Menglu, Nishimura, Shin-ichi, Shibata, Toshifumi, and Wang, Ze Zhou
- Subjects
- *
LONG short-term memory , *LANDSLIDE hazard analysis , *CONVOLUTIONAL neural networks , *DEEP learning , *NUMERICAL calculations , *LANDSLIDES - Abstract
Early warning detection of landslide hazards often requires real-time or near real-time predictions, which can be challenging due to the presence of multiple geo-uncertainties and time-variant external environmental loadings. The propagation of these uncertainties at the system level for understanding the spatiotemporal behavior of slopes often requires time-consuming numerical calculations, significantly hindering the establishment of an early warning system. This paper presents a hybrid deep learning simulator, which fuses p arallel c onvolutional neural networks (CNNs) and l ong short-term memory (LSTM) networks through a ttention mechanisms, termed PCLA-Net, to facilitate time-dependent probabilistic assessment of landslide hazards. PCLA-Net features two novelties. First, it is capable of simultaneously handling both temporal and spatial information. CNNs specialize in interpreting spatial data, while LSTM excels in handling time-variant data. Coupled with two attention mechanisms, the two modules are combined to probabilistically predict the spatiotemporal behavior of slopes. Second, PCLA-Net realizes end-to-end predictions. In this paper, the Liangshuijing landslide in the Three Gorges Reservoir area of China is used to illustrate PCLA-Net. It is first validated followed by a comparison with existing techniques to demonstrate its improved predictive capabilities. The proposed PCLA-Net simulator can achieve the same level of accuracy with at least 50% reduction in computation resources. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
33. HybridCBAMNet: Enhancing time series binary classification with convolutional recurrent networks and attention mechanisms.
- Author
-
Huang, Mei-Ling and Yang, Yi-Ting
- Subjects
- *
ARTIFICIAL neural networks , *TIME series analysis , *FEATURE extraction , *INTERNET of things , *BIG data , *RECURRENT neural networks - Abstract
• Integrates convolutional and recurrent neural networks on time series data. • Captures relevant patterns and representations from one-dimensional sequence data. • Attention Mechanisms focus on crucial information within the input data. • Capture temporal dependencies and contextual information in both directions. • The HybridCBAMNet outperforms several state-of-the-art models on UCR archive. The rapid advancement of Internet of Things technology and the increasing availability of big data have resulted in an exponential growth of time series data, highlighting a pressing need for effective classification methods. This study introduces HybridCBAMNet, a novel convolutional recurrent neural network model enhanced with recurrent networks and attention mechanisms for binary time series classification. The architecture integrates Conv1D-based feature extraction modules to extract relevant features, alongside attention enhancement modules and convolutional block attention modules. Additionally, bidirectional recurrent units are utilized to capture temporal dependencies and contextual information in both forward and backward directions. The model achieves top F1-scores in seven out of thirty-six binary classification tasks, significantly surpassing the performance of fourteen existing state-of-the-art models from the UCR archive. These results demonstrate that HybridCBAMNet not only enhances classification accuracy but also improves the model generalization capabilities, contributing valuable insights to the field of time series analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
34. ERAT:Eyeglasses removal with attention.
- Author
-
Zhang, Haitao and Guo, Jingtao
- Subjects
- *
EYEGLASSES , *ATTENTION , *SUPERVISION , *EDITING - Abstract
• We propose a new solution for high-quality eyeglasses removal that utilizes attention mechanism to implement eyeglasses detection, eyeglasses removal, and generate visually realistic new content for the removal region. • A novel eyeglasses attention network that learns to precisely detect eyeglasses by employing a novel combination of label filtering and attention mask filtering strategies. • A three-path eyeglasses removal network is proposed, which uses the eyeglasses attention mask and combines the idea of data synthesis to achieve eyeglasses removal in the way of supervision and adversarial training. Eyeglasses removal has witnessed substantial progress in recent years, the key to eyeglasses removal is to accurately detect the eyeglasses and generate visually realistic new content in the eyeglasses removal region. However, current methods either cannot remove eyeglasses cleanly due to inability to accurately detect eyeglasses or fail to synthesize visually realistic eye information in the eyeglasses removal region. In this paper, we propose a new solution for high-quality eyeglasses removal that employs the attention mechanism to focus on solving two key challenges of the eyeglasses removal task. Specifically, our proposed method divides the eyeglasses removal task into two stages. The first stage learns the eyeglasses attention map by simultaneously imposing label filtering strategy and attention mask filtering strategy on both the latent feature space and image space, which mainly solves the challenges of eyeglass detection. The second stage fuses the attention map into a novel three-path network to remove eyeglasses and synthesize visually realistic content in the eyeglasses removal region. Experiments show that our method owns superior performance than almost all existing techniques on the task of eyeglasses removal. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
35. FBSA-Net: A novel model based on attention mechanisms for emotion recognition in VR and 2D scenes.
- Author
-
Xie, Jialan, Luo, Yutong, Lan, Ping, and Liu, Guangyuan
- Subjects
- *
EMOTION recognition , *VIRTUAL reality , *EMOTIONAL state , *ELECTROENCEPHALOGRAPHY , *EMOTIONS - Abstract
Recent studies have found that electroencephalographic (EEG) features from different frequency bands and different brain regions contribute differently to emotion recognition (ER) in virtual reality (VR) and two-dimensional (2D) scenes. Despite achievements in the task of ER in both VR and 2D environments, there has not been much prior effort to develop a unified ER model that can be applied to both VR and 2D environments. We propose a novel Frequency-Band-Spatial-based Attention Network, named FBSA-Net, for ER in VR and 2D scenes. Specifically, FBSA-Net adaptively captures the frequency-band-spatial relationship of EEG signals through cascade fusion of frequency-band attention (FBA) module and spatial attention (SA) module. The FBA module can automatically assign FBA weights to each band feature. The SA module then applies the ProbeParse attention mechanism to adaptively explore channel relationships both within and across brain regions, thus facilitating a comprehensive understanding of the global features of the EEG signals. The model's efficacy was validated through the VRSDEED dataset, created from emotion induction experiments using VR and 2D induction programs. Multiple comparative experiments have shown that FBSA-Net achieved state-of-the-art recognition results on VRSDEED datasets, achieving average recognition accuracies of 96.63% and 96.49% in VR and 2D scenes, respectively. The research results indicate that the significant contribution of FBSA-Net is not only reflected in the ability to provide ER using the same model in VR and 2D environments, but also in its excellent ability to distinguish between the emotional states in these two scenarios, providing a valuable reference for using the same model for ER. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
36. Classify breast cancer pathological tissue images using multi-scale bar convolution pooling structure with patch attention.
- Author
-
Guo, Dongen, Lin, Yuyao, Ji, Kangyi, Han, Linbo, Liao, Yongbo, Shen, Zhen, Feng, Jiangfan, and Tang, Man
- Subjects
ARTIFICIAL neural networks ,CONVOLUTIONAL neural networks ,CANCER diagnosis ,BREAST cancer ,PATHOLOGISTS - Abstract
Pathological diagnosis plays a crucial role in the diagnosis and treatment of breast cancer. It is of profound clinical significance to construct a neural network model that can automatically classify breast cancer pathological tissue images (BCPTI) to assist pathologists in making accurate diagnoses. It is worth noting that although many convolutional neural network models have shown promising results in the recognition of BCPTI, They often fail to take full advantage of the elongated class pathological features present in BCPTI. Based on this problem, we propose a new feature extraction architecture to increase the performance of the model, which can extract rectangular features and elongated features in BCPTI through multi-scale strip convolution and pooling. In addition, we also propose a novel attention mechanism, which increases the weight values of key features from both channel and spatial aspects. More importantly, to alleviate the problem of the weak ability of convolutional neural networks to extract global features, in terms of spatial attention, we divide the image into nine patches and input them into the multi-layer perceptron to form weights to increase the global feature expression ability of the model. We modified the above innovative solution to the DenseNet model and reduced the batch normalization layer, and activation layer in the original model to maintain the feature diversity of the model. Finally, the binary classification accuracy in BreakHis dataset reaches 99.88%, and the eight classification accuracy reaches 97.62% • This paper proposes a novel CNN for classification in Breakhis dataset with competitive performance. • An attention mechanism using chunking to aggregate local features is proposed. • A multi-scale bar convolution and pool multi-branch structure is proposed. • We verify and compare the performance of the proposed models and modules from multiple perspectives. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
37. Lightweight attention mechanisms for EEG emotion recognition for brain computer interface.
- Author
-
Gunda, Naresh Kumar, Khalaf, Mohammed I., Bhatnagar, Shaleen, Quraishi, Aadam, Gudala, Leeladhar, Venkata, Ashok Kumar Pamidi, Alghayadh, Faisal Yousef, Alsubai, Shtwai, and Bhatnagar, Vaibhav
- Subjects
- *
EMOTION recognition , *BRAIN-computer interfaces , *COMPUTER interfaces , *DIFFERENTIAL entropy , *DEEP learning - Abstract
In the realm of brain-computer interfaces (BCI), identifying emotions from electroencephalogram (EEG) data is a difficult endeavor because of the volume of data, the intricacy of the signals, and the several channels that make up the signals. Using dual-stream structure scaling and multiple attention mechanisms (LDMGEEG), a lightweight network is provided to maximize the accuracy and performance of EEG-based emotion identification. Reducing the number of computational parameters while maintaining the current level of classification accuracy is the aim. This network employs a symmetric dual-stream architecture to assess separately time-domain and frequency-domain spatio-temporal maps constructed using differential entropy features of EEG signals as inputs. The experimental results show that after significantly lowering the number of parameters, the model achieved the best possible performance in the field, with a 95.18 % accuracy on the SEED dataset. Moreover, it reduced the number of parameters by 98 % when compared to existing models. The proposed method distinct channel-time/frequency-space multiple attention and post-attention methods enhance the model's ability to aggregate features and result in lightweight performance. • Identifying emotions from EEG data is a difficult due to signal complexities. • A lightweight network is provided to improve accuracy of emotion identification. • Differential entropy features of EEG signals are used as inputs. • Proposed model achieved 95.18 % accuracy on the SEED dataset. • Proposed method enhances the model's ability to aggregate features. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
38. A novel multimodal depression diagnosis approach utilizing a new hybrid fusion method.
- Author
-
Zhang, Xiufeng, Li, Bingyi, and Qi, Guobin
- Subjects
CONVOLUTIONAL neural networks ,STANDARD deviations ,DATA mining ,DEEP learning ,FACIAL expression - Abstract
• The hybrid fusion method is superior to traditional single-modal fusion methods and can balance the information extraction capability of a single modality with the information fusion capability of multiple modalities. • Due to the specificity of depression tasks, the long-term information cannot be ignored. Our method considers both short-term information extraction and long-term dependencies. • In the feature extraction and fusion stage, the same modalities with different representation methods are fused and input into the network for training, in order to enhance the features related to depression in the data. In recent years, research has found that the impact of depression status primarily lies in patients' language expression and facial expressions. Furthermore, facial expressions and intonation in speech exhibit a natural coexistence, making facial and vocal information core recognition indicators in depression identification. It is imperative to explore the effective use of deep learning methods for multimodal depression detection. We have proposed a novel trilateral bimodal encoding model (MEN), attentional decision fusion (ADF), and feature extraction fusion strategy. We employed a hybrid fusion approach that combines early intra-modality fusion with late inter-modality fusion, for multimodal depression diagnosis. In the feature extraction fusion component, we combine different representations of the same modality before inputting them into the network for training, enhancing features relevant to depression in the data. Through our multimodal encoding network, we extract frame-level information using Convolutional Neural Networks (CNN) while considering long-term context information and dependencies with Bidirectional Long Short-Term Memory (BiLSTM). Finally, the three streams of information were effectively integrated through attention fusion representation in our Attention Decision Fusion module (ADF), for depression score regression prediction. Extensive experiments were conducted on two public datasets, AVEC2013 and AVEC2014. The average absolute error/ root mean squared error (MAE/RMSE) scores for predicting depression scores were 6.48/8.91 and 7.01/9.38, respectively. This demonstrated that our hybrid fusion method outperforms traditional early or late fusion methods in terms of performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
39. Multi-scale network with attention mechanism for underwater image enhancement.
- Author
-
Tao, Ye, Tang, Jinhui, Zhao, Xinwei, Zhou, Chen, Wang, Chong, and Zhao, Zhonglei
- Subjects
- *
IMAGE intensifiers , *IMAGE registration , *IMAGE segmentation - Abstract
Underwater images suffer from severe degradation due to the complicated environments. Though numerous approaches have been proposed, accurately correcting color bias meanwhile effectively enhancing contrast is still a difficult problem. In order to address this issue, a dedicated designed Multi-scale Network with Attention mechanism (MNA) is introduced in this work. Concretely, MNA contains four key characteristics: (a) setting more convolution layers in shallow flows, (b) letting connections from high-level to adjacent low-level stream progressively, (c) simplifying dual attention mechanism embedding it in conventional residual block, (d) exploiting channel attention module to fuse multi-scale information rather than conventional summation operation. Extensive experiments demonstrate that our MNA achieves better performance than some well-recognized technologies. Meanwhile, ablation study proves the effectiveness of each component in our MNA. In addition, extended applications demonstrate the improvement of our MNA in local feature points matching and image segmentation. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. Enhancing rail safety through real-time defect detection: A novel lightweight network approach.
- Author
-
Cao, Yuan, Liu, Yue, Sun, Yongkui, Su, Shuai, and Wang, Feng
- Subjects
- *
INDUSTRIAL capacity , *SPINE - Abstract
The rapid detection of internal rail defects is critical to maintaining railway safety, but this task faces a significant challenge due to the limited computational resources of onboard detection systems. This paper presents YOLOv8n-LiteCBAM, an advanced network designed to enhance the efficiency of rail defect detection. The network designs a lightweight DepthStackNet backbone to replace the existing CSPDarkNet backbone. Further optimization is achieved through model pruning techniques and the incorporation of a novel Bidirectional Convolutional Block Attention Module (BiCBAM). Additionally, inference acceleration is realized via ONNX Runtime. Experimental results on the rail defect dataset demonstrate that our model achieves 92.9% mAP with inference speeds of 136.79 FPS on the GPU and 38.36 FPS on the CPU. The model's inference speed outperforms that of other lightweight models and ensures that it meets the real-time detection requirements of Rail Flaw Detection (RFD) vehicles traveling at 80 km/h. Consequently, the YOLOv8n-LiteCBAM network is with some potential for industrial application in the expedited detection of internal rail defects. • Introducing a DepthStackNet backbone and model pruning to improve detection speed. • Designing a Bidirectional Convolutional Block Attention Module to boost performance. • Ensuring real-time detection meets the 80 km/h speed requirement for RFD vehicles. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. Visual attention methods in deep learning: An in-depth survey.
- Author
-
Hassanin, Mohammed, Anwar, Saeed, Radwan, Ibrahim, Khan, Fahad Shahbaz, and Mian, Ajmal
- Subjects
- *
DEEP learning , *COMPUTER vision , *TRANSFORMER models , *ATTENTION , *RESEARCH personnel - Abstract
Inspired by the human cognitive system, attention is a mechanism that imitates the human cognitive awareness about specific information, amplifying critical details to focus more on the essential aspects of data. Deep learning has employed attention to boost performance for many applications. Interestingly, the same attention design can suit processing different data modalities and can easily be incorporated into large networks. Furthermore, multiple complementary attention mechanisms can be incorporated into one network. Hence, attention techniques have become extremely attractive. However, the literature lacks a comprehensive survey on attention techniques to guide researchers in employing attention in their deep models. Note that, besides being demanding in terms of training data and computational resources, transformers only cover a single category in self-attention out of the many categories available. We fill this gap and provide an in-depth survey of 50 attention techniques, categorizing them by their most prominent features. We initiate our discussion by introducing the fundamental concepts behind the success of the attention mechanism. Next, we furnish some essentials such as the strengths and limitations of each attention category, describe their fundamental building blocks, basic formulations with primary usage, and applications specifically for computer vision. We also discuss the challenges and general open questions related to attention mechanisms. Finally, we recommend possible future research directions for deep attention. All the information about visual attention methods in deep learning is provided at https://github.com/saeed-anwar/VisualAttention • An in-depth exploration of attention techniques for gaps, context, and insights. • Categorizing mechanisms for models to distinguish relevant information. • Responding to the attention-related paper surge, guiding adoption in vision. • Diverging from transformer-centric surveys, offering a unique vision perspective. • Providing valuable recommendations for navigating challenges and future direction. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. AFLEMP: Attention-based Federated Learning for Emotion recognition using Multi-modal Physiological data.
- Author
-
Gahlan, Neha and Sethia, Divyashikha
- Subjects
FEDERATED learning ,AFFECTIVE computing ,EMOTION recognition ,ARTIFICIAL neural networks ,AFFECTIVE neuroscience ,DATA privacy ,TRANSFORMER models - Abstract
Automated emotion recognition systems utilizing physiological signals are essential for affective computing and intelligent interaction. Combining the multiple physiological signals is more precise and effective in accurately assessing a person's emotional state. These automated emotion recognition systems using conventional machine learning techniques require complete access to the physiological data for emotion state classification, compromising sensitive data privacy. Federated Learning (FL) resolves this issue by preserving the user's privacy and sensitive physiological data while recognizing emotions. However, existing FL methods have limitations in handling data heterogeneity in the physiological data and do not measure communication efficiency and scalability. In response to these challenges, this paper proposes a unique novel framework called AFLEMP (Attention-based Federated Learning for Emotion recognition using Multi-modal Physiological data) integrating attention mechanism-based Transformer with an Artificial Neural Network (ANN) model. The framework reduces two types of data heterogeneity: (1) Variation Heterogeneity (VH) in multi-modal EEG, GSR, and ECG physiological signal data using attention mechanisms and (2) Imbalanced Data Heterogeneity (IDH) in the FL environment using scaled weighted federated averaging. This paper validates the proposed AFLEMP framework on two publicly available emotion datasets, AMIGOS and DREAMER, achieving an average accuracy of 88.30% and 84.10%, respectively. The proposed AFLEMP framework proves robust, scalable, and efficient in communication. AFLEMP is the first FL framework to propose for emotion recognition using multi-modal physiological signals while reducing data heterogeneity and outperforming existing FL methods. • AFLEMP for Automated Emotion Recognition using Multi-Modal Physiological Signals: EEG, ECG and GSR. • Federated Learning for preserving sensitive physiological data during emotion recognition. • Addressing variation heterogeneity (VH) and imbalanced data heterogeneity (IDH) in FL environment. • Melds different Attention Mechanisms, Transformer with Artificial Neural Network (ANN). [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. SI-News: Integrating social information for news recommendation with attention-based graph convolutional network.
- Author
-
Zhu, Peng, Cheng, Dawei, Luo, Siqiang, Yang, Fangzhou, Luo, Yifeng, Qian, Weining, and Zhou, Aoying
- Subjects
- *
ELECTRONIC newspapers , *MACHINE learning - Abstract
High-quality news recommendation heavily relies on accurate and timely representations of news documents and user interests. Social information, which usually contains the most recent information about the activities of users and their friends, naturally reflects the dynamics and diversities of user interests. However, existing news recommendation approaches often overlook these dynamic items, and thus lead to suboptimal performance. In this paper, we propose a novel approach by embedding users' interests from their social information by attentional graph convolutional network (GCN). We also improve news representations by jointly optimizing the titles and contents of news via attention mechanisms. Extensive experiments on three benchmark datasets show that our approach effectively improves news recommendation performance compared with state-of-the-art baselines. We also evaluate our model on a real-world dataset and the results demonstrate the superior performance of the proposed techniques in industry-level applications. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
44. Meta-learning meets the Internet of Things: Graph prototypical models for sensor-based human activity recognition.
- Author
-
Zheng, Wenbo, Yan, Lan, Gou, Chao, and Wang, Fei-Yue
- Subjects
- *
HUMAN activity recognition , *INTERNET of things , *DEEP learning , *WEARABLE technology - Abstract
With the rapid growth of the Internet of Things (IoT), smart systems and applications are equipped with an increasing number of wearable sensors and mobile devices. These sensors are used not only to collect data but, more importantly, to assist in tracking and analyzing the daily human activities. Sensor-based human activity recognition is a hotspot and starts to employ deep learning approaches to supersede traditional shallow learning that rely on hand-crafted features. Although many successful methods have been proposed, there are three challenges to overcome: (1) deep model's performance overly depends on the data size; (2) deep model cannot explicitly capture abundant sample distribution characteristics; (3) deep model cannot jointly consider sample features, sample distribution characteristics, and the relationship between the two. To address these issues, we propose a meta-learning-based graph prototypical model with priority attention mechanism for sensor-based human activity recognition. This approach learns not only sample features and sample distribution characteristics via meta-learning-based graph prototypical model, but also the embeddings derived from priority attention mechanism that mines and utilizes relations between sample features and sample distribution characteristics. What is more, the knowledge learned through our approach can be seen as a priori applicable to improve the performance for other general reasoning tasks. Experimental results on fourteen datasets demonstrate that the proposed approach significantly outperforms other state-of-the-art methods. On the other hand, experiments of applying our model to two other tasks show that our model effectively supports other recognition tasks related to human activity and improves performance on the datasets of these tasks. • This paper presents a novel meta-learning-based graph prototypical model. • We design a novel attention mechanism called priority attention mechanism. • Results on fourteen datasets validate the effectiveness of the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
45. Encoding global semantic and localized geographic spatial-temporal relations for traffic accident risk prediction.
- Author
-
Alhaek, Fares, Li, Tianrui, Rajeh, Taha M., Javed, Muhammad Hafeez, and Liang, Weichao
- Abstract
The proliferation of vehicles and the intricate layout of road systems have contributed to a significant rise in traffic accidents, posing a pressing concern globally. Despite the advancements facilitated by deep learning, several challenges persist in the domain of traffic accident prediction. Firstly, the sparsity of accident data in certain city regions presents a significant obstacle, particularly when attempting fine-grained predictions at a local level. Secondly, the intricate spatial-temporal relations inherent in traffic accident data pose a challenge for existing prediction models. In response to these formidable challenges, this paper presents GLST-TARP, a novel model for predicting traffic accident risk in urban environments by leveraging both global semantic and localized geographic spatial-temporal relations. Specifically, we construct multi-graphs to encode global static and dynamic spatial relations, incorporating attention mechanisms to adaptively focus on relevant information and temporal relations. Additionally, we employ channel-wise convolutional neural network blocks to extract localized geographic features and enhance predictive accuracy. The proposed model is trained using a Huber loss function tailored for regression tasks, relieving the impact of zero values during optimization. Experimental results demonstrate that GLST-TARP outperforms state-of-the-art methods in predicting traffic accident risks, showcasing its potential for enhancing urban safety and transportation management. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
46. Pairwise attention network for cross-domain image recognition.
- Author
-
Gao, Zan, Liu, Yanbo, Xu, Guangpin, and Wen, Xianbin
- Subjects
- *
IMAGE recognition (Computer vision) , *MACHINE learning , *LEARNING communities , *DATA distribution - Abstract
In recent years, the domain adaption has received wide attention from machine learning communities because of differences in data distribution or the lack of training data in a practical machine learning task. In this work, we propose a P airwise A ttention N etwork (PAN for short) for addressing cross-domain image recognition task. In this model, different local features and the global-feature are concatenated to obtain different attention estimators, and then they are combined to get the attention map. In this way, we can focus on the important parts of an image, and ignore the irrelative regions. Moreover, attention consistency is also embedded in PAN to make sure consistent interest regions in the same class. Besides, to improve the feature discrimination, an embedding discriminative subspace is learned where it maps positive sample pairs aligned in a hypersphere and negative sample pairs separated. Extensive experimental results on the MNIST-USPS, office, and Visda-2017 datasets demonstrate that PAN can outperform state-of-the-art methods in terms of average accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
47. ESIE-BERT: Enriching sub-words information explicitly with BERT for intent classification and slot filling.
- Author
-
Guo, Yu, Xie, Zhilong, Chen, Xingyan, Chen, Huangen, Wang, Leilei, Du, Huaming, Wei, Shaopeng, Zhao, Yu, Li, Qing, and Wu, Gang
- Subjects
- *
NATURAL language processing , *LANGUAGE models , *INFORMATION storage & retrieval systems , *NATURAL languages - Abstract
Natural language understanding (NLU) has two core tasks: intent classification and slot filling. The success of pre-training language models resulted in a significant breakthrough in the two tasks. The architecture based on autoencoding (BERT-based model) can optimize the two tasks jointly. However, we note that BERT-based models convert each complex token into multiple sub-tokens by the Wordpiece algorithm, which generates an out-of-alignment between the length of the tokens and the labels. This leads to BERT-based models not performing well in label prediction, limiting model performance improvement. Many existing models can address this issue, but some hidden semantic information is discarded during fine-tuning. We addressed the problem by introducing a novel joint method on top of BERT. This method explicitly models multiple sub-token features after the Wordpiece tokenization, contributing to both tasks. Our proposed method effectively extracts contextual features from complex tokens using the Sub-words Attention Adapter (SAA), preserving overall utterance information. Additionally, we propose an Intent Attention Adapter (IAA) to acquire comprehensive sentence features, assisting users in predicting intent. Experimental results confirm that our proposed model exhibits significant improvements on two public benchmark datasets. Specifically, the slot-filling F1 score improves from 96.5 to 98.2 (an absolute increase of 1.7%) on the Airline Travel Information Systems (ATIS) dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
48. Forecasting carbon price with attention mechanism and bidirectional long short-term memory network.
- Author
-
Qin, Chaoyong, Qin, Dongling, Jiang, Qiuxian, and Zhu, Bangzhu
- Subjects
- *
CARBON pricing , *HILBERT-Huang transform , *FORECASTING , *RECURRENT neural networks - Abstract
To improve the precision of carbon price forecasting, our study aims to propose a novel hybrid forecasting model which integrates recurrent neural networks and attention mechanisms. First, the improved complete ensemble empirical mode decomposition with adaptive noise (ICEEMDAN) algorithm is employed to decompose carbon prices into several regular intrinsic mode functions (IMFs) and a residual. Second, multiscale entropy is utilized to differentiate and reconstruct these components to reduce cumulative errors in subsequent forecasting. Subsequently, a bidirectional long short-term memory network (Bi-LSTM) equipped with attention mechanisms is used to forecast each reconstructed component. Attention mechanisms identifies crucial sequence elements, assigns different weights to hidden information, and extracts richer information from the series. Finally, the results of all components are integrated to obtain the final forecasting result. Empirical analysis conducted on real datasets from the Guangdong and Hubei carbon markets demonstrates that the proposed hybrid model outperform prevailing mainstream forecasting models in terms of both horizontal and directional forecasting metrics. • A hybrid forecasting model is proposed for carbon price forecasting. • ICEEMDAN algorithm is employed to decompose carbon prices into several simple modes. • Bi-LSTM equipped with attention mechanisms is used to forecast each mode. • The proposed hybrid model outperform prevailing mainstream forecasting models. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. Multi-scale spatial pyramid attention mechanism for image recognition: An effective approach.
- Author
-
Yu, Yang, Zhang, Yi, Cheng, Zeyu, Song, Zhe, and Tang, Chengkai
- Subjects
- *
IMAGE recognition (Computer vision) , *CONVOLUTIONAL neural networks , *PYRAMIDS , *MULTISCALE modeling - Abstract
Attention mechanisms have gradually become necessary to enhance the representational power of convolutional neural networks (CNNs). Despite recent progress in attention mechanism research, some open problems still exist. Most existing methods ignore modeling multi-scale feature representations, structural information, and long-range channel dependencies, which are essential for delivering more discriminative attention maps. This study proposes a novel, low-overhead, high-performance attention mechanism with strong generalization ability for various networks and datasets. This mechanism is called Multi-Scale Spatial Pyramid Attention (MSPA) and can be used to solve the limitations of other attention methods. For the critical components of MSPA, we not only develop the Hierarchical-Phantom Convolution (HPC) module, which can extract multi-scale spatial information at a more granular level utilizing hierarchical residual-like connections, but also design the Spatial Pyramid Recalibration (SPR) module, which can integrate structural regularization and structural information in an adaptive combination mechanism, while employing the Softmax operation to build long-range channel dependencies. The proposed MSPA is a powerful tool that can be conveniently embedded into various CNNs as a plug-and-play component. Correspondingly, using MSPA to replace the 3 × 3 convolution in the bottleneck residual blocks of ResNets, we created a series of simple and efficient backbones named MSPANet, which naturally inherit the advantages of MSPA. Without bells and whistles, our method substantially outperforms other state-of-the-art counterparts in all evaluation metrics based on extensive experimental results from CIFAR-100 and ImageNet-1K image recognition. When applying MSPA to ResNet-50, our model achieves top-1 classification accuracy of 81.74% and 78.40% on the CIFAR-100 and ImageNet-1K benchmarks, exceeding the corresponding baselines by 3.95% and 2.27%, respectively. We also obtained promising performance improvements of 1.15% and 0.91% compared to the competitive EPSANet-50. In addition, empirical research results in autonomous driving engineering applications also demonstrate that our method can significantly improve the accuracy and real-time performance of image recognition with cheaper overhead. Our code is publicly available at https://github.com/ndsclark/MSPANet. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. ResQu-Net: Effective prostate's peripheral zone segmentation leveraging the representational power of attention-based mechanisms.
- Author
-
Zaridis, Dimitrios I., Mylona, Eugenia, Tachos, Nikolaos, Kalantzopoulos, Charalampos Ν., Marias, Kostas, Tsiknakis, Manolis, Matsopoulos, George K., Koutsouris, Dimitrios D., and Fotiadis, Dimitrios I.
- Subjects
PROSTATE ,STANDARD deviations ,MAGNETIC resonance imaging ,DEEP learning - Abstract
• We propose a novel Deep Learning model, namely the ResQu-Net, specifically tailored for Peripheral Zone Segmentation on T2W MR images. • Six DL architectures were implemented and compared using three openly available datasets with 392 patient cases. • Qualitative Explainability Analysis reveals that conventional segmentation metrics, such as Dice Score, and Hausdorff Distance, might not reflect properly the performance of a DL segmentation model. Prostate cancer is a leading cause of male cancer worldwide. With more than 70 % of prostate cancers arising in the peripheral zone of the prostate, accurate segmentation of this region is of paramount importance for the effective diagnosis and treatment of the disease. Although peripheral zone is well recognized as one of the most challenging regions to delineate within the prostate, no algorithms specifically tailored for this segmentation task are currently available. The present study introduces a new deep learning (DL) algorithm, named as ResQu-Net, which is designed to accurately segment the peripheral zone (PZ) of the prostate on T2-weighted magnetic resonance imaging (MRI). Using three publicly available datasets, the ResQu-Net outperformed the six DL segmentation models used for comparison, namely the Attention U-Net, the Dense2U-Net, the Proper-Net, the TransU-net, the U-Net, and the USE-Net, demonstrating superior performance for different anatomical regions, such as the apex, the midgland and the base. The assessment of the suggested approach was conducted not only quantitatively (Sensitivity, Balanced Accuracy, Dice Score, 95 % Hausdorff Distance, and Average Surface Distance) but also qualitatively. For the qualitative evaluation the feature maps obtained from the last layers of each model were compared with the Density Map of the Ground Truth annotations using root mean squared error. Overall, the ResQu-Net model exhibits improved performance compared to other models, of more than 5 % and 1.87 mm in terms of Dice Score and 95 % Hausdorff Distance, respectively. These advancements may contribute significantly in addressing the challenges associated with PZ segmentation, and ultimately enabling improved clinical decision-making and patient outcomes. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.