1,151 results on '"Attention mechanisms"'
Search Results
2. Dictionary trained attention constrained low rank and sparse autoencoder for hyperspectral anomaly detection
- Author
-
Hu, Xing, Li, Zhixuan, Luo, Lingkun, Karimi, Hamid Reza, and Zhang, Dawei
- Published
- 2025
- Full Text
- View/download PDF
3. Recording brain activity while listening to music using wearable EEG devices combined with Bidirectional Long Short-Term Memory Networks
- Author
-
Wang, Jingyi, Wang, Zhiqun, and Liu, Guiran
- Published
- 2024
- Full Text
- View/download PDF
4. The novel graph transformer-based surrogate model for learning physical systems
- Author
-
Feng, Bo and Zhou, Xiao-Ping
- Published
- 2024
- Full Text
- View/download PDF
5. R-AFNIO: Redundant IMU fusion with attention mechanism for neural inertial odometry
- Author
-
Yang, Bing, Wang, Xuan, Huang, Fengrong, Cao, Xiaoxiang, and Zhang, Zhenghua
- Published
- 2025
- Full Text
- View/download PDF
6. Volleyball training video classification description using the BiLSTM fusion attention mechanism
- Author
-
Ruiye, Zhao
- Published
- 2024
- Full Text
- View/download PDF
7. AUTOMATED ANALYSIS OF CHANGES IN PRIVACY POLICIES: A STRUCTURED SELF-ATTENTIVE SENTENCE EMBEDDING APPROACH.
- Author
-
Lin, Fangyu, Samtani, Sagar, Zhu, Hongyi, Laura, Brandimarte, and Chen, Hsinchun
- Abstract
The increasing societal concern for consumer information privacy has led to the enforcement of privacy regulations worldwide. In an effort to adhere to privacy regulations such as the General Data Protection Regulation (GDPR), many companies’ privacy policies have become increasingly lengthy and complex. In this study, we adopted the computational design science paradigm to design a novel privacy policy evolution analytics framework to help identify how companies change and present their privacy policies based on privacy regulations. The framework includes a self-attentive annotation system (SAAS) that automatically annotates paragraph-length segments in privacy policies to help stakeholders identify data practices of interest for further investigation. We rigorously evaluated SAAS against state-of-the-art machine learning (ML) and deep learning (DL)-based methods on a well-established privacy policy dataset, OPP-115. SAAS outperformed conventional ML and DL models in terms of F1-score by statistically significant margins. We demonstrate the proposed framework’s practical utility with an in-depth case study of GDPR’s impact on Amazon’s privacy policies. The case study results indicate that Amazon’s post-GDPR privacy policy potentially violates a fundamental principle of GDPR by causing consumers to exert more effort to find information about first-party data collection. Given the increasing importance of consumer information privacy, the proposed framework has important implications for regulators and companies. We discuss several design principles followed by the SAAS that can help guide future design science-based e-commerce, health, and privacy research. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Enhancing Vulnerability Prioritization in Cloud Computing Using Multi-View Representation Learning.
- Author
-
Ullman, Steven, Samtani, Sagar, Zhu, Hongyi, Lazarine, Ben, Chen, Hsinchun, and Nunamaker Jr., Jay F.
- Subjects
VIRTUAL machine systems ,WEB services ,CLOUD computing ,BEHAVIORAL research ,SECURITY personnel ,DEEP learning - Abstract
Cybersecurity is a present and growing concern that needs to be addressed with both behavioral and design-oriented research. Public cloud providers such as Amazon Web Services and federal funding agencies such as the National Science Foundation have invested billions of dollars into developing high-performance computing resources accessible to users through configurable virtual machine (VM) images. This approach offers users the flexibility of changing and updating their environment for their computational needs. Despite the substantial benefits, users often introduce thousands of vulnerabilities by installing open-source software packages and misconfiguring file systems. Given the scale of vulnerabilities, security personnel struggle to identify and prioritize vulnerable assets for remediation. In this research, we designed a novel unsupervised deep learning-based Multi-View Combinatorial-Attentive Autoencoder (MV-CAAE) to capture multi-dimensional vulnerability data and automatically identify groups of similar vulnerable compute instances to help facilitate the development of targeted remediation strategies. We rigorously evaluated the proposed MV-CAAE against state-of-the-art methods in three technical clustering experiments. Experiment results indicate that the MV-CAAE achieves V-measure scores (metric of cluster quality) 8 percent-48 percent higher than benchmark methods. We demonstrated the practical value through a comprehensive case study by clustering vulnerable VMs and gathering qualitative feedback from experienced security professionals through semi-structured interviews. The results indicated that clustering vulnerable assets can help prioritize vulnerable instances for remediation and enhance decision-making tasks. The present design-research work also contributes to our theoretical knowledge of cyber-defense. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. SCANet: Semantic Coherence Attention Network for Clothing Change Person Re-identification
- Author
-
Yang, Dajiang, Wu, Wei, Lee, Yuxing, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Ide, Ichiro, editor, Kompatsiaris, Ioannis, editor, Xu, Changsheng, editor, Yanai, Keiji, editor, Chu, Wei-Ta, editor, Nitta, Naoko, editor, Riegler, Michael, editor, and Yamasaki, Toshihiko, editor
- Published
- 2025
- Full Text
- View/download PDF
10. STA: Enhancing Spatio-temporal Crowd Flow Prediction Using Attention-based Deep Learning and Feature Similarity
- Author
-
Xu, Xiujuan, Liu, RenJie, Ai, Jiaxin, Liu, Yu, Zhao, Xiaowei, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Sheng, Quan Z., editor, Dobbie, Gill, editor, Jiang, Jing, editor, Zhang, Xuyun, editor, Zhang, Wei Emma, editor, Manolopoulos, Yannis, editor, Wu, Jia, editor, Mansoor, Wathiq, editor, and Ma, Congbo, editor
- Published
- 2025
- Full Text
- View/download PDF
11. AttRel: Single Module Based Joint Entity and Relation Extraction with Attention Enhanced Text Embedding
- Author
-
Cui, Mengmeng, Li, Chenbin, Xiang, Haolong, Qi, Lianyong, Dou, Wanchun, Xu, Xiaolong, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Sheng, Quan Z., editor, Dobbie, Gill, editor, Jiang, Jing, editor, Zhang, Xuyun, editor, Zhang, Wei Emma, editor, Manolopoulos, Yannis, editor, Wu, Jia, editor, Mansoor, Wathiq, editor, and Ma, Congbo, editor
- Published
- 2025
- Full Text
- View/download PDF
12. DELA: Dual Embedding Using LSTM and Attention for Asset Tag Inference in Industrial Automation Systems
- Author
-
Zhao, Zhen, Erickson, Brian Kenneth, Chakraborty, Shantanu, Liu, Wei, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Gong, Mingming, editor, Song, Yiliao, editor, Koh, Yun Sing, editor, Xiang, Wei, editor, and Wang, Derui, editor
- Published
- 2025
- Full Text
- View/download PDF
13. Refining Multiple Instance Learning with Attention Regularization for Whole Slide Image Classification
- Author
-
Carretero, Ilán, Meseguer, Pablo, del Amor, Rocío, Naranjo, Valery, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Julian, Vicente, editor, Camacho, David, editor, Yin, Hujun, editor, Alberola, Juan M., editor, Nogueira, Vitor Beires, editor, Novais, Paulo, editor, and Tallón-Ballesteros, Antonio, editor
- Published
- 2025
- Full Text
- View/download PDF
14. Foreign Object Classification for Coal Conveyor Belts Based on Deep Learning
- Author
-
Chen, Siyu, Pei, Mingtao, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Lin, Zhouchen, editor, Cheng, Ming-Ming, editor, He, Ran, editor, Ubul, Kurban, editor, Silamu, Wushouer, editor, Zha, Hongbin, editor, Zhou, Jie, editor, and Liu, Cheng-Lin, editor
- Published
- 2025
- Full Text
- View/download PDF
15. MST-Gait: Application of Multi-scale Temporal Modeling to Gait Recognition
- Author
-
Shen, Yuzhuo, Yan, Fei, Liu, Lan, Li, Siyu, Liu, Yunqing, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Lin, Zhouchen, editor, Cheng, Ming-Ming, editor, He, Ran, editor, Ubul, Kurban, editor, Silamu, Wushouer, editor, Zha, Hongbin, editor, Zhou, Jie, editor, and Liu, Cheng-Lin, editor
- Published
- 2025
- Full Text
- View/download PDF
16. MNA-net: Multimodal Neuroimaging Attention-Based Architecture for Cognitive Decline Prediction
- Author
-
Vo, Jamie, Sharif, Naeha, Mubashar Hassan, Ghulam, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Rekik, Islem, editor, Adeli, Ehsan, editor, Park, Sang Hyun, editor, and Cintas, Celia, editor
- Published
- 2025
- Full Text
- View/download PDF
17. Short-term traffic flow prediction based on spatial–temporal attention time gated convolutional network with particle swarm optimization: Short-term traffic flow prediction based on spatial–temporal attention time gated convolutional network with particle swarm optimization: Z. Li et al.
- Author
-
Li, Zhongxing, Li, Zenan, Pan, Chaofeng, and Wang, Jian
- Abstract
Recently, the surge in vehicle ownership has led to a corresponding increase in the complexity of traffic data. Consequently, accurate traffic flow prediction has become crucial for effective traffic management. While the advancements in intelligent transportation system (ITS) and internet of things (IoT) technology have facilitated traffic flow prediction, many existing methods overlook the influence of the training process on model accuracy. Traditional approaches often fail to account for this critical aspect. Hence, a new approach to traffic flow prediction is introduced in this paper: a spatial–temporal attention time-gated convolutional network based on particle swarm optimization (PSO-STATG). This method uses the particle swarm algorithm to dynamically optimize the learning rate and epoch parameters throughout the training process. Firstly, spatial–temporal correlations are extracted through spatial map convolution and time-gated convolution, facilitated by an attention mechanism. Subsequently, the learning rate and epoch parameters are dynamically adjusted during the training phase via the particle swarm optimization algorithm. Finally, experiments are conducted with real-world datasets, and the results are compared with those from several existing methods. The experimental results indicate that the accuracy and stability of our proposed model in predicting traffic flow are superior. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
18. Augmented histopathology: Enhancing colon cancer detection through deep learning and ensemble techniques.
- Author
-
Gowthamy, J and Ramesh, S. S. Subashka
- Abstract
Colon cancer poses a significant threat to human life with a high global mortality rate. Early and accurate detection is crucial for improving treatment quality and the survival rate. This paper presents a comprehensive approach to enhance colon cancer detection and classification. The histopathological images are gathered from the CRC‐VAL‐HE‐7K dataset. The images undergo preprocessing to improve quality, followed by augmentation to increase dataset size and enhance model generalization. A deep learning based transformer model is designed for efficient feature extraction and enhancing classification by incorporating a convolutional neural network (CNN). A cross‐transformation model captures long‐range dependencies between regions, and an attention mechanism assigns weights to highlight crucial features. To boost classification accuracy, a Siamese network distinguishes colon cancer tissue classes based on probabilities. Optimization algorithms fine‐tune model parameters, categorizing colon cancer tissues into different classes. The multi‐class classification performance is evaluated in the experimental evaluation, which demonstrates that the proposed model provided highest accuracy rate of 98.84%. In this research article, the proposed method achieved better performance in all analyses by comparing with other existing methods. Research Highlights: Deep learning‐based techniques are proposed.DL methods are used to enhance colon cancer detection and classification.CRC‐VAL‐HE‐7K dataset is utilized to enhance image quality.Hybrid particle swarm optimization (PSO) and dwarf mongoose optimization (DMO) are used.The deep learning models are tuned by implementing the PSO‐DMO algorithm. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
19. Hybrid-CT: a novel hybrid 2D/3D CNN-Transformer based on transfer learning and attention mechanisms for small object classification.
- Author
-
Bayoudh, Khaled and Mtibaa, Abdellatif
- Abstract
In recent years, convolutional neural networks (CNNs) have proven their effectiveness in many challenging computer vision-based tasks, including small object classification. However, according to recent literature, this task is mainly based on 2D CNNs, and the small size of object instances makes their recognition a challenging task. Since 3D CNNs are extremely tedious and time-consuming to learn, they cannot be used in a way that requires a trade-off between accuracy and efficiency. Moreover, due to the great success of Transformers in the field of natural language processing (NLP), a spatial Transformer can also be used as a robust feature transformer and has recently been successfully applied to computer vision tasks, including image classification. By incorporating attention mechanisms into the Transformers, many NLP and computer vision tasks can achieve excellent performance and help learn the contextual encoding of the input patches. However, the complexity of these tasks generally increases with the dimension of the input feature space. In this paper, we propose a novel hybrid 2D/3D CNN-Transformer based on transfer learning and attention mechanisms for better performance on a low-resolution dataset. First, the combination of a pre-trained deep CNN and a 3D CNN can significantly reduce the complexity and result in an accurate learning algorithm. Second, a pre-trained deep CNN model is used as a robust feature extractor and combined with a spatial Transformer to improve the representational power of the developed model and take advantage of the powerful global modeling capabilities of Transformers. Finally, spatial attention and channel attention are adaptively fused by focusing on all components in the input space to capture local and global spatial correlations on non-overlapping regions of the input representation. Experimental results show that the proposed framework has significant relevance in terms of efficiency and accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
20. Multi-scale Unet-based feature aggregation network for lightweight image deblurring.
- Author
-
Yang, Yancheng, Gai, Shaoyan, and Da, Feipeng
- Abstract
The single image deblurring task has made remarkable progress, with convolutional neural networks exhibiting extraordinary performance. However, existing methods maintain high-quality reconstruction through an excessive number of parameters and extremely deep network structures, which results in increased requirements for computational resources and memory storage, making it challenging to deploy on resource-constrained devices. Numerous experiments indicate that current models still possess redundant parameters. To address these issues, we introduce a multi-scale Unet-based feature aggregation network (MUANet). This network architecture is based on a single-stage Unet, which significantly simplifies the network’s complexity. A lightweight Unet-based attention block is designed, based on a progressive feature extraction module to enhance feature extraction from multi-scale attention modules. Given the extraordinary performance of the self-attention mechanism, we propose a self-attention mechanism based on fourier transform and a depthwise convolutional feed-forward network to enhance the network’s feature extraction capability. This module contains extractors with different receptive fields for feature extraction at different spatial scales and capturing contextual information. Through the aggregation of multi-scale features from different attention mechanisms, our method learns a set of rich features that retain contextual information from multiple scales and high-resolution spatial details. Extensive experiments show that the proposed MUANet achieves competitive results in lightweight deblurring qualitative and quantitative evaluations. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
21. Optimized GRU‐Based Voltage Fault Prediction Method for Lithium‐Ion Battery Packs in Real‐Life.
- Author
-
Shen, Hongyu, Liu, Yuefeng, Zhao, Qiyan, Xue, Guoyue, and Zhang, Tiange
- Subjects
- *
OPTIMIZATION algorithms , *RANDOM forest algorithms , *DUNG beetles , *ELECTRIC transformers , *VOLTAGE , *OVERVOLTAGE - Abstract
ABSTRACT Various failures of lithium‐ion batteries threaten the safety and performance of the battery system. Due to the insignificant anomalies and the nonlinear time‐varying properties of the cell, current methods for identifying the diverse faults in battery packs suffer from low accuracy and an inability to precisely determine the type of fault, a method has been proposed that utilizes the Random Forest algorithm (RF) to select key factors influencing voltage, optimizes model parameters through an Improved Dung Beetle Optimization algorithm (IDBO), employs a Gated Recurrent Unit (GRU) integrated with a channel and time attention mechanism (CTAM) for voltage fault prediction, and the consistency of the voltage is measured by quantifying the predicted voltage curve based on the curve Manhattan distance, a hybrid model for predicting voltage faults in lithium‐ion battery packs has been constructed, ultimately identifying faults such as overvoltage, undervoltage, and inconsistency of the battery pack. The experimental results show that the hybrid model proposed in this study outperforms the state‐of‐the‐art techniques such as informer and transformer in voltage fault prediction by achieving MAE, MSE, and MAPE metrics of 0.009272%, 0.000222%, and 0.246%, respectively, and maintains high efficiency in terms of the number of parameters and runtime. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. MTAF–DTA: multi-type attention fusion network for drug–target affinity prediction: MTAF–DTA: multi-type...: J. Sun et al.
- Author
-
Sun, Jinghong, Wang, Han, Mi, Jia, Wan, Jing, and Gao, Jingyang
- Subjects
- *
DRUG discovery , *DRUG interactions , *ARTIFICIAL intelligence , *BLOCK designs , *THERAPEUTICS - Abstract
Background: The development of drug–target binding affinity (DTA) prediction tasks significantly drives the drug discovery process forward. Leveraging the rapid advancement of artificial intelligence, DTA prediction tasks have undergone a transformative shift from wet lab experimentation to machine learning-based prediction. This transition enables a more expedient exploration of potential interactions between drugs and targets, leading to substantial savings in time and funding resources. However, existing methods still face several challenges, such as drug information loss, lack of calculation of the contribution of each modality, and lack of simulation regarding the drug–target binding mechanisms. Results: We propose MTAF–DTA, a method for drug–target binding affinity prediction to solve the above problems. The drug representation module extracts three modalities of features from drugs and uses an attention mechanism to update their respective contribution weights. Additionally, we design a Spiral-Attention Block (SAB) as drug–target feature fusion module based on multi-type attention mechanisms, facilitating a triple fusion process between them. The SAB, to some extent, simulates the interactions between drugs and targets, thereby enabling outstanding performance in the DTA task. Our regression task on the Davis and KIBA datasets demonstrates the predictive capability of MTAF–DTA, with CI and MSE metrics showing respective improvements of 1.1% and 9.2% over the state-of-the-art (SOTA) method in the novel target settings. Furthermore, downstream tasks further validate MTAF–DTA's superiority in DTA prediction. Conclusions: Experimental results and case study demonstrate the superior performance of our approach in DTA prediction tasks, showing its potential in practical applications such as drug discovery and disease treatment. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
23. 基于超感知图神经网络的轴承故障诊断方法.
- Author
-
陈岩岩 and 朱彦敏
- Abstract
The traditional graph neural network did not consider the problems of difficult feature extraction and weak extraction ability under the condition of fault, noise and vibration waveform changes, which led to low bearing fault diagnosis accuracy under real working conditions. In order to solve this problem, a bearing fault diagnosis method based on extra-perceptual graph neural network (E-GCN) was proposed. In this method, the dilation residual module was proposed, which could effectively expand the receptive field and learn richer and more effective features, thereby improving its performance and generalization ability. The method constructed additional figure convolution module to obtain the global information of the graph, and it also used the attention module to weight the features of the neighbor nodes, so that it could focus more on the important features, which further improved the accuracy and efficiency of diagnosis. The experiments were carried out on the bearing dataset of Paderborn University (PU) and Case Western Reserve University (CWRU), and the diagnostic accuracy rate reached 98. 2% and 99. 1%,respectively. This method can be effectively applied to the field of fault diagnosis of bearings and other machinery. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. An Attribute Graph Embedding Algorithm for Sensing Topological and Attribute Influence.
- Author
-
Chen, Dongming, Zhang, Shuyue, Zhao, Yumeng, Xie, Mingzhao, and Wang, Dongqi
- Subjects
- *
GRAPH neural networks , *GRAPH algorithms , *PROBLEM solving , *TOPOLOGY , *SCALABILITY - Abstract
The unsupervised attribute graph embedding technique aims to learn low-dimensional node embedding using neighborhood topology and attribute information under unlabeled data. Current unsupervised models are mostly based on graph self-encoders, but full-batch training limits the scalability of the model and ignores attribute integrity when reconstructing the topology. In order to solve the above problems while considering the unsupervised learning of the model and full use of node information, this paper proposes a graph neural network architecture based on a graph self-encoder to capture the nonlinearity of the attribute graph data, and an attribute graph embedding algorithm that explicitly models the influence of neighborhood information using a multi-level attention mechanism. Specifically, the proposed algorithm fuses topology information and attribute information using a lightweight sampling strategy, constructs an unbiased graph self-encoder on the sampled graph, implements topology aggregation and attribute aggregation, respectively, models the correlation between topology embedding and attribute embedding, and considers multi-level loss terms. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. Planetary Gearboxes Fault Diagnosis Based on Markov Transition Fields and SE-ResNet.
- Author
-
Liu, Yanyan, Gao, Tongxin, Wu, Wenxu, and Sun, Yongquan
- Abstract
The working conditions of planetary gearboxes are complex, and their structural couplings are strong, leading to low reliability. Traditional deep neural networks often struggle with feature learning in noisy environments, and their reliance on one-dimensional signals as input fails to capture the interrelationships between data points. To address these challenges, we proposed a fault diagnosis method for planetary gearboxes that integrates Markov transition fields (MTFs) and a residual attention mechanism. The MTF was employed to encode one-dimensional signals into feature maps, which were then fed into a residual networks (ResNet) architecture. To enhance the network's ability to focus on important features, we embedded the squeeze-and-excitation (SE) channel attention mechanism into the ResNet34 network, creating a SE-ResNet model. This model was trained to effectively extract and classify features. The developed method was validated using a specific dataset and achieved an accuracy of about 98.1%. The results demonstrate the effectiveness and reliability of the developed method in diagnosing faults in planetary gearboxes under strong noise conditions. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. An innovative approach to vibration signal denoising and fault diagnosis using attention-enriched joint learning.
- Author
-
Xiang, Feifan, Wang, Zili, Qiu, Lemiao, Zhang, Shuyou, Zhu, Linhao, Zhang, Huang, and Tan, Jianrong
- Subjects
- *
FAULT diagnosis , *SIGNAL denoising , *ROLLER bearings , *INFORMATION sharing , *SIGNAL processing - Abstract
Vibration signals play a crucial role in mechanical fault diagnosis. However, they are susceptible to various noise disturbances, presenting challenges for reliable fault detection. We propose an end-to-end Cross-task Attention Joint Learning (CTA-JL) model that concurrently denoises and diagnoses faults in noisy signals. This model utilizes a multi-task encoder, composed of task-shared and task-specific feature encoding units, along with a feature information exchange unit with a Cross-task Attention (CTA) mechanism, fostering information exchange across different tasks. By collectively executing diagnosis and denoising tasks and sharing valuable task information, the model enhances prediction accuracy and denoising performance. Under three noise conditions of SNR = −9 dB, −6 dB, and −3 dB, the prediction accuracy of CTA-JL on the rolling bearing datasets reached 91.38%, 97.95%, and 99.69%, respectively. Meanwhile, the result on elevator guide system datasets reached 87.31%, 95.58%, and 99.64% [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. KTMN: Knowledge-driven Two-stage Modulation Network for visual question answering.
- Author
-
Shi, Jingya, Han, Dezhi, Chen, Chongqing, and Shen, Xiang
- Abstract
Existing visual question answering (VQA) methods introduce the Transformer as the backbone architecture for intra- and inter-modal interactions, demonstrating its effectiveness in dependency relationship modeling and information alignment. However, the Transformer’s inherent attention mechanisms tend to be affected by irrelevant information and do not utilize the positional information of objects in the image during the modelling process, which hampers its ability to adequately focus on key question words and crucial image regions during answer inference. Considering this issue is particularly pronounced on the visual side, this paper designs a Knowledge-driven Two-stage Modulation self-attention mechanism to optimize the internal interaction modeling of image sequences. In the first stage, we integrate textual context knowledge and the geometric knowledge of visual objects to modulate and optimize the query and key matrices. This effectively guides the model to focus on visual information relevant to the context and geometric knowledge during the information selection process. In the second stage, we design an information comprehensive representation to apply a secondary modulation to the interaction results from the first modulation. This further guides the model to fully consider the overall context of the image during inference, enhancing its global understanding of the image content. On this basis, we propose a Knowledge-driven Two-stage Modulation Network (KTMN) for VQA, which enables fine-grained filtering of redundant image information while more precisely focusing on key regions. Finally, extensive experiments conducted on the datasets VQA v2 and CLEVR yielded Overall accuracies of 71.36% and 99.20%, respectively, providing ample validation of the proposed method’s effectiveness and rationality. Source code is available at . [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. CFFANet: category feature fusion and attention mechanism network for retinal vessel segmentation.
- Author
-
Chen, Qiyu, Wang, Jianming, Yin, Jiting, and Yang, Zizhong
- Abstract
Retinal vessel segmentation is a computer-aided diagnostic method for ophthalmic disease analysis. Owing to the complex structure of the retinal vasculature, it is difficult for the segmentation network to capture effective features, and the semantic gap between different layers of features leads to insufficient feature fusion and thus makes segmentation difficult. In this paper, we propose a new segmentation network called CFFANet. Firstly, to capture accurate and sufficient global and local features, we design a Multi-scale Residual Pooling Module. In addition, a Category Feature Fusion Module is proposed to fuse category features at different stages to reduce the semantic gap between layers. Finally, a Frequency Channel Fusion Cross Attention Module is incorporated to reduce redundant semantic information during feature fusion. We conducted experiments on the DRIVE, CHASEDB1 and STARE datasets. The Dice and MIoU scores of the CFFANet network on the above datasets are 83.0, 82.9, 84.2, 84.1, and 84.1, 84.5. The ablation experiments also validate the effectiveness of the main modules in the network. The experiments show the value of the method in retinal vessel segmentation tasks. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. MDA-YOLO Person: a 2D human pose estimation model based on YOLO detection framework.
- Author
-
Dong, Chengang, Tang, Yuhao, and Zhang, Liyan
- Subjects
- *
BODY image , *HUMAN body , *POSE estimation (Computer vision) , *ARCHAEOLOGICAL human remains , *PERSONAL names , *DETECTORS - Abstract
Human pose estimation aims to locate and predict the key points of the human body in images or videos. Due to the challenges of capturing complex spatial relationships and handling different body scales, accurate estimation of human pose remains challenging. Our work proposes a real-time human pose estimation method based on the anchor-assisted YOLOv7 framework, named MDA-YOLO Person. In this study, we propose the Keypoint Augmentation Strategies (KAS) to overcome the challenges faced in human pose estimation and improve the model's ability to accurately predict keypoints. Furthermore, we introduce the Anchor Adjustment Module (AAM) as a replacement for the original YOLOv7's detection head. By adjusting the parameters associated with the detector's anchors, we achieve an increased recall rate and enhance the completeness of the pose estimation. Additionally, we incorporate the Multi-Scale Dual-Head Attention (MDA) module, which effectively models the weights of both channel and spatial dimensions at multiple scales, enabling the model to focus on more salient feature information. As a result, our approach outperforms other methods, as demonstrated by the promising results obtained on two large-scale public datasets. MDA-YOLO Person outperforms the baseline model YOLOv7-pose on both MS COCO 2017 and CrowdPose datasets, with improvements of 2.2% and 3.7% in precision and recall on MS COCO 2017, and 1.9% and 3.5% on CrowdPose, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
30. Extraction of entity relationships serving the field of agriculture food safety regulation.
- Author
-
Zhao, Zhihua, Liu, Yiming, Lv, Dongdong, Li, Ruixuan, Yu, Xudong, and Mao, Dianhui
- Abstract
Agriculture food (agri-food) safety is closely related to all aspects of people's lives. In recent years, with the emergence of deep learning technology based on big data, the extraction of information relations in the field of agri-food safety supervision has become a research hotspot. However, most of the current work only expands the relationship recognition based on the traditional named entity recognition task, which makes it difficult to establish a true 'connection' between entities and relationships. The pipelined and federated extraction architectures that have emerged in this area are problematic in practice. In addition, the contextual information of the text corpus in the agri-food safety regulatory domain has not been fully utilized. To address the above issues, this paper proposes a semi-joint entity relationship extraction model (EB-SJRE) based on contextual entity boundary features. Firstly, a Token pair subject-object correspondence matrix label is designed to intuitively model the subject-object boundary, which is more friendly to complex entities in the field of agri-food safety regulation. Secondly, the dynamic fine-tuning of Bert makes the text embedding more relevant to the textual context of the agri-food safety regulation domain. Finally, we introduce an attention mechanism in the Token pair tagging framework to capture deep semantic subject-object boundary association information, which cleverly solves the problem of bias exposure due to the pipeline structure and the dimensional explosion due to the joint extraction structure. The experimental results show that our model achieves the best F1-score of 88.71% on agri-food safety regulation domain data and F1-scores of 92.36%, 92.80%, 88.91%, and 92.21% on NYT, NYT-star, WebNLG, and WebNLG-star, respectively. This indicates that EB-SJRE has excellent generalization ability in both the agri-food safety regulatory and public sectors. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
31. Hierarchical Residual Attention Network for Musical Instrument Recognition Using Scaled Multi-Spectrogram.
- Author
-
Chen, Rujia, Ghobakhlou, Akbar, and Narayanan, Ajit
- Abstract
Featured Application: This proposed method could potentially be applied to musical instrument classification tasks, contributing to the organization and analysis of audio data where identifying instruments is required. This may be useful in research, music information retrieval systems, or other related applications. Musical instrument recognition is a relatively unexplored area of machine learning due to the need to analyze complex spatial–temporal audio features. Traditional methods using individual spectrograms, like STFT, Log-Mel, and MFCC, often miss the full range of features. Here, we propose a hierarchical residual attention network using a scaled combination of multiple spectrograms, including STFT, Log-Mel, MFCC, and CST features (Chroma, Spectral contrast, and Tonnetz), to create a comprehensive sound representation. This model enhances the focus on relevant spectrogram parts through attention mechanisms. Experimental results with the OpenMIC-2018 dataset show significant improvement in classification accuracy, especially with the "Magnified 1/4 Size" configuration. Future work will optimize CST feature scaling, explore advanced attention mechanisms, and apply the model to other audio tasks to assess its generalizability. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. Adaptive Dynamic Shuffle Convolutional Parallel Network for Image Super-Resolution.
- Author
-
Long, Yiting, Ruan, Haoyu, Zhao, Hui, Liu, Yi, Zhu, Lei, Zhang, Chengyuan, and Zhu, Xinghui
- Subjects
IMAGE reconstruction ,HIGH resolution imaging ,FEATURE extraction ,COMPUTATIONAL complexity ,IMAGE reconstruction algorithms ,DEEP learning - Abstract
Image super-resolution has experienced significant advancements with the emergence of deep learning technology. However, deploying highly complex super-resolution networks on resource-constrained devices poses a challenge due to their substantial computational requirements. This paper presents the Adaptive Dynamic Shuffle Convolutional Parallel Network (ADSCPN), a novel lightweight super-resolution model designed to achieve an optimal balance between computational efficiency and image reconstruction quality. The ADSCPN framework employs large-kernel parallel depthwise separable convolutions, dynamic convolutions, and an enhanced attention mechanism to optimize feature extraction and improve detail preservation. Extensive evaluations on standard benchmark datasets demonstrate that ADSCPN achieves state-of-the-art performance while significantly reducing computational complexity, making it well-suited for practical applications on devices with limited computational resources. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
33. A Kernel Attention-based Transformer Model for Survival Prediction of Heart Disease Patients.
- Author
-
Kaushal, Palak, Singh, Shailendra, and Vijayvergiya, Rajesh
- Abstract
Survival analysis is employed to scrutinize time-to-event data, with emphasis on comprehending the duration until the occurrence of a specific event. In this article, we introduce two novel survival prediction models: CosAttnSurv and CosAttnSurv + DyACT. CosAttnSurv model leverages transformer-based architecture and a softmax-free kernel attention mechanism for survival prediction. Our second model, CosAttnSurv + DyACT, enhances CosAttnSurv with Dynamic Adaptive Computation Time (DyACT) control, optimizing computation efficiency. The proposed models are validated using two public clinical datasets related to heart disease patients. When compared to other state-of-the-art models, our models demonstrated an enhanced discriminative and calibration performance. Furthermore, in comparison to other transformer architecture-based models, our proposed models demonstrate comparable performance while exhibiting significant reduction in both time and memory requirements. Overall, our models offer significant advancements in the field of survival analysis and emphasize the importance of computationally effective time-based predictions, with promising implications for medical decision-making and patient care. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
34. Distributed CV classification with attention mechanisms.
- Author
-
Chafi, Soumia, Kabil, Mustapha, and Kamouss, Abdessamad
- Subjects
RECURRENT neural networks ,ARTIFICIAL intelligence ,DEEP learning ,TRANSFORMER models ,SENTIMENT analysis - Abstract
Text classification is a crucial domain within natural language processing (NLP), with applications ranging from document categorization to sentiment analysis. In this context, the use of attention mechanisms in neural networks has emerged as an effective method to enhance the performance of classification models. This comparative study focuses on the application of these mechanisms to CV classification, adopting a distributed approach with Apache Spark to handle large datasets. We explore several neural network architectures, including recurrent neural networks (RNNs) and transformer neural networks, integrating various attention mechanisms such as global attention, contextual attention, and multi-head attention. The performance of these models is compared to traditional text classification methods such as SVMs, taking into account the scalability and processing speed offered by Spark. Experiments are conducted on a diverse dataset of CVs. Results show that models based on neural networks with attention mechanisms, combined with a distributed Spark architecture, significantly outperform traditional approaches. Additionally, we analyze the impact of various Spark configuration parameters, such as the number of nodes and allocated memory, on model performance. In conclusion, this study demonstrates the effectiveness of attention mechanisms in the specific context of CV classification, highlighting the advantages of a distributed approach for efficient processing of large textual datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
35. AMTLUS: Attention-guided multi-task learning with uncertainty estimation in skin lesion segmentation and classification.
- Author
-
Kasukurthi, Aravinda and Davuluri, Rajya Lakshmi
- Subjects
FEATURE extraction ,DERMOSCOPY ,DIAGNOSIS ,EARLY diagnosis ,DEEP learning ,MELANOMA - Abstract
Skin lesion segmentation and classification from dermoscopic images have emerged as pivotal research topics, playing vibrant role in early detection and diagnosis of skin diseases, including melanoma. Previous studies have employed various deep learning models for skin lesion segmentation and classification, enabling the automatic learning of complex and discriminative features from dermoscopic images. However, inherent challenges arise due to the variance in skin lesion shape, size, and contrast, leading to intrinsic limitations of former models, such as Isolated Representation Learning, Uniform Attention, Limited Model Generalization, Reduced Model Interpretability, and Uncertainty. To address these limitations and propel the field forward, this paper introduces a novel frameworkcalled AMTLUS that leverages Multi-Task Learning (MTL) in conjunction with deep Attention Mechanisms and Uncertainty Estimation. The integration of MTL facilitates joint training of segmentation and classification tasks, enabling shared representation learning and efficient utilization of data. Incorporating attention mechanisms dynamically focuses on informative regions within dermoscopic images, improving segmentation accuracy and feature extraction for classification. Uncertainty estimation techniques quantify model confidence, offering probabilistic interpretations for improved reliability and interpretability. Our widespread experiments conducted on the ISIC-2016 dataset demonstrate superior accuracy and reliability, showcasing the proposed model's capability to identify challenging cases. This deep learning framework represents a significant advancement in automated skin lesion analysis, enhancing early detection and diagnosis of skin diseases, including melanoma. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
36. Three-Dimensional Instance Segmentation Using the Generalized Hough Transform and the Adaptive n-Shifted Shuffle Attention.
- Author
-
Mulindwa, Desire Burume, Du, Shengzhi, and Liu, Qingxue
- Subjects
- *
AUGMENTED reality , *AUTONOMOUS vehicles , *ROBOTICS , *NOISE , *HOUGH transforms - Abstract
The progress of 3D instance segmentation techniques has made it essential for several applications, such as augmented reality, autonomous driving, and robotics. Traditional methods usually have challenges with complex indoor scenes made of multiple objects with different occlusions and orientations. In this work, the authors present an innovative model that integrates a new adaptive n-shifted shuffle (ANSS) attention mechanism with the Generalized Hough Transform (GHT) for robust 3D instance segmentation of indoor scenes. The proposed technique leverages the n-shifted sigmoid activation function, which improves the adaptive shuffle attention mechanism, permitting the network to dynamically focus on relevant features across various regions. A learnable shuffling pattern is produced through the proposed ANSS attention mechanism to spatially rearrange the relevant features, thus augmenting the model's ability to capture the object boundaries and their fine-grained details. The integration of GHT furnishes a vigorous framework to localize and detect objects in the 3D space, even when heavy noise and partial occlusions are present. The authors evaluate the proposed method on the challenging Stanford 3D Indoor Spaces Dataset (S3DIS), where it establishes its superiority over existing methods. The proposed approach achieves state-of-the-art performance in both mean Intersection over Union (IoU) and overall accuracy, showcasing its potential for practical deployment in real-world scenarios. These results illustrate that the integration of the ANSS and the GHT yields a robust solution for 3D instance segmentation tasks. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
37. SSFAN: A Compact and Efficient Spectral-Spatial Feature Extraction and Attention-Based Neural Network for Hyperspectral Image Classification.
- Author
-
Wang, Chunyang, Zhan, Chao, Lu, Bibo, Yang, Wei, Zhang, Yingjie, Wang, Gaige, and Zhao, Zongze
- Subjects
- *
IMAGE recognition (Computer vision) , *CONVOLUTIONAL neural networks , *TRANSFORMER models , *LAND cover , *PINE - Abstract
Hyperspectral image (HSI) classification is a crucial technique that assigns each pixel in an image to a specific land cover category by leveraging both spectral and spatial information. In recent years, HSI classification methods based on convolutional neural networks (CNNs) and Transformers have significantly improved performance due to their strong feature extraction capabilities. However, these improvements often come with increased model complexity, leading to higher computational costs. To address this, we propose a compact and efficient spectral-spatial feature extraction and attention-based neural network (SSFAN) for HSI classification. The SSFAN model consists of three core modules: the Parallel Spectral-Spatial Feature Extraction Block (PSSB), the Scan Block, and the Squeeze-and-Excitation MLP Block (SEMB). After preprocessing the HSI data, it is fed into the PSSB module, which contains two parallel streams, each comprising a 3D convolutional layer and a 2D convolutional layer. The 3D convolutional layer extracts spectral and spatial features from the input hyperspectral data, while the 2D convolutional layer further enhances the spatial feature representation. Next, the Scan Block module employs a layered scanning strategy to extract spatial information at different scales from the central pixel outward, enabling the model to capture both local and global spatial relationships. The SEMB module combines the Spectral-Spatial Recurrent Block (SSRB) and the MLP Block. The SSRB, with its adaptive weight assignment mechanism in the SToken Module, flexibly handles time steps and feature dimensions, performing deep spectral and spatial feature extraction through multiple state updates. Finally, the MLP Block processes the input features through a series of linear transformations, GELU activation functions, and Dropout layers, capturing complex patterns and relationships within the data, and concludes with an argmax layer for classification. Experimental results show that the proposed SSFAN model delivers superior classification performance, outperforming the second-best method by 1.72 % , 5.19 % , and 1.94 % in OA, AA, and Kappa coefficient, respectively, on the Indian Pines dataset. Additionally, it requires less training and testing time compared to other state-of-the-art deep learning methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
38. Lightweight enhanced YOLOv8n underwater object detection network for low light environments.
- Author
-
Ding, Jifeng, Hu, Junquan, Lin, Jiayuan, and Zhang, Xiaotong
- Subjects
- *
ATTENUATION of light , *DATA augmentation , *FEATURE extraction , *ALGORITHMS , *GENERALIZATION - Abstract
In response to the challenges of target misidentification, missed detection, and other issues arising from severe light attenuation, low visibility, and complex environments in current underwater target detection, we propose a lightweight low-light underwater target detection network, named PDSC-YOLOv8n. Firstly, we enhance the input images using the improved Pro MSRCR algorithm for data augmentation. Secondly, we replace the traditional convolutions in the backbone and neck networks of YOLOv8n with Ghost and GSConv modules respectively to achieve lightweight network modeling. Additionally, we integrate the improved DCNv3 module into the C2f module of the backbone network to enhance the capability of target feature extraction. Furthermore, we introduce the GAM into the SPPF and incorporate the CBAM attention mechanism into the last layer of the backbone network to enhance the model's perceptual and generalization capabilities. Finally, we optimize the training process of the model using WIoUv3 as the loss function. The model is successfully deployed on an embedded platform, achieving real-time detection of underwater targets on the embedded platform. We first conduct experiments on the RUOD underwater dataset. Meanwhile, we also utilized the Pascal VOC2012 dataset to evaluate our approach. The mAP@0.5 and mAP@0.5:0.95 of the original YOLOv8n algorithm on RUOD dataset were 79.6% and 58.2%, respectively, and the PDSC -YOLOv8n algorithm mAP@0.5 and mAP@0.5:0.95 can reach 86.1% and 60.8%. The number of parameters of the model is reduced by about 15.5%, the detection accuracy is improved by 6.5%. The original YOLOv8n algorithm was 73% and 53.2% mAP@0.5 and mAP@0.5:0.95 on the Pascal VOC dataset, respectively. The PDSC-YOLOv8n algorithm mAP@0.5 and mAP@0.5:0.95 were 78.5% and 57%, respectively. The superior performance of PDSC-YOLOv8n indicates its effectiveness in the field of underwater target detection. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
39. Transformer-Enhanced Retinal Vessel Segmentation for Diabetic Retinopathy Detection Using Attention Mechanisms and Multi-Scale Fusion.
- Author
-
Kim, Hyung-Joo, Eesaar, Hassan, and Chong, Kil To
- Subjects
DEEP learning ,VISION disorders ,MEDICAL screening ,PEOPLE with visual disabilities ,RETINAL blood vessels ,PYRAMIDS ,DIABETIC retinopathy - Abstract
Eye health has become a significant concern in recent years, given the rising prevalence of visual impairment resulting from various eye disorders and related factors. Global surveys suggest that approximately 2.2 billion individuals are visually impaired, with at least 1 billion affected by treatable diseases or ailments. Early detection, treatment, and screening for fundus diseases are crucial in addressing these challenges. In this study, we propose a novel segmentation model for retinal vascular delineation aimed at diagnosing diabetic retinopathy. The model integrates CBAM (Channel-Attention and Spatial-Attention) for enhanced feature representation, JPU (Joint Pyramid Upsampling) for multi-scale feature fusion, and transformer blocks for contextual understanding. Leveraging deep-learning techniques, our proposed model outperforms existing approaches in retinal vascular segmentation, like achieving a Mean IOU of 0.8047, Recall of 0.7254, Precision of 0.8492, F1 Score of 0.7824, and Specificity of 0.9892 for CHASEDB1 dataset. Extensive evaluations on benchmark datasets demonstrate its efficacy, highlighting its potential for automated diabetic retinopathy screening. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. Research on an Eye Control Method Based on the Fusion of Facial Expression and Gaze Intention Recognition.
- Author
-
Sun, Xiangyang and Cai, Zihan
- Subjects
EYE movements ,FACIAL expression ,ARTIFICIAL intelligence ,PROBLEM solving ,PSYCHOLOGY ,EYE tracking - Abstract
With the deep integration of psychology and artificial intelligence technology and other related technologies, eye control technology has achieved certain results at the practical application level. However, it is found that the accuracy of the current single-modal eye control technology is still not high, which is mainly caused by the inaccurate eye movement detection caused by the high randomness of eye movements in the process of human–computer interaction. Therefore, this study will propose an intent recognition method that fuses facial expressions and eye movement information and expects to complete an eye control method based on the fusion of facial expression and eye movement information based on the multimodal intent recognition dataset, including facial expressions and eye movement information constructed in this study. Based on the self-attention fusion strategy, the fused features are calculated, and the multi-layer perceptron is used to classify the fused features, so as to realize the mutual attention between different features, and improve the accuracy of intention recognition by enhancing the weight of effective features in a targeted manner. In order to solve the problem of inaccurate eye movement detection, an improved YOLOv5 model was proposed, and the accuracy of the model detection was improved by adding two strategies: a small target layer and a CA attention mechanism. At the same time, the corresponding eye movement behavior discrimination algorithm was combined for each eye movement action to realize the output of eye behavior instructions. Finally, the experimental verification of the eye–computer interaction scheme combining the intention recognition model and the eye movement detection model showed that the accuracy of the eye-controlled manipulator to perform various tasks could reach more than 95 percent based on this scheme. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. Novel Approach in Vegetation Detection Using Multi-Scale Convolutional Neural Network.
- Author
-
Albalooshi, Fatema A.
- Subjects
CONVOLUTIONAL neural networks ,VEGETATION patterns ,LAND management ,FEATURE extraction ,ENVIRONMENTAL monitoring - Abstract
Vegetation segmentation plays a crucial role in accurately monitoring and analyzing vegetation cover, growth patterns, and changes over time, which in turn contributes to environmental studies, land management, and assessing the impact of climate change. This study explores the potential of a multi-scale convolutional neural network (MSCNN) design for object classification, specifically focusing on vegetation detection. The MSCNN is designed to integrate multi-scale feature extraction and attention mechanisms, enabling the model to capture both fine and coarse vegetation patterns effectively. Moreover, the MSCNN architecture integrates multiple convolutional layers with varying kernel sizes (3 × 3, 5 × 5, and 7 × 7), enabling the model to extract features at different scales, which is vital for identifying diverse vegetation patterns across various landscapes. Vegetation detection is demonstrated using three diverse datasets: the CamVid dataset, the FloodNet dataset, and the multispectral RIT-18 dataset. These datasets present a range of challenges, including variations in illumination, the presence of shadows, occlusion, scale differences, and cluttered backgrounds, which are common in real-world scenarios. The MSCNN architecture allows for the integration of information from multiple scales, facilitating the detection of diverse vegetation types under varying conditions. The performance of the proposed MSCNN method is rigorously evaluated and compared against state-of-the-art techniques in the field. Comprehensive experiments showcase the effectiveness of the approach, highlighting its robustness in accurately segmenting and classifying vegetation even in complex environments. The results indicate that the MSCNN design significantly outperforms traditional methods, achieving a remarkable global accuracy and boundary F1 score (BF score) of up to 98%. This superior performance underscores the MSCNN's capability to enhance vegetation detection in imagery, making it a promising tool for applications in environmental monitoring and land use management. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. A comprehensive guide to content-based image retrieval algorithms with visualsift ensembling.
- Author
-
Ramesh Babu Durai, C., Sathesh Raaj, R., Sekharan, Sindhu Chandra, and Nishok, V.S.
- Abstract
BACKGROUND: Content-based image retrieval (CBIR) systems are vital for managing the large volumes of data produced by medical imaging technologies. They enable efficient retrieval of relevant medical images from extensive databases, supporting clinical diagnosis, treatment planning, and medical research. OBJECTIVE: This study aims to enhance CBIR systems' effectiveness in medical image analysis by introducing the VisualSift Ensembling Integration with Attention Mechanisms (VEIAM). VEIAM seeks to improve diagnostic accuracy and retrieval efficiency by integrating robust feature extraction with dynamic attention mechanisms. METHODS: VEIAM combines Scale-Invariant Feature Transform (SIFT) with selective attention mechanisms to emphasize crucial regions within medical images dynamically. Implemented in Python, the model integrates seamlessly into existing medical image analysis workflows, providing a robust and accessible tool for clinicians and researchers. RESULTS: The proposed VEIAM model demonstrated an impressive accuracy of 97.34% in classifying and retrieving medical images. This performance indicates VEIAM's capability to discern subtle patterns and textures critical for accurate diagnostics. CONCLUSIONS: By merging SIFT-based feature extraction with attention processes, VEIAM offers a discriminatively powerful approach to medical image analysis. Its high accuracy and efficiency in retrieving relevant medical images make it a promising tool for enhancing diagnostic processes and supporting medical research in CBIR systems. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. Optimizing Fire Scene Analysis: Hybrid Convolutional Neural Network Model Leveraging Multiscale Feature and Attention Mechanisms.
- Author
-
Muksimova, Shakhnoza, Umirzakova, Sabina, Abdullaev, Mirjamol, and Cho, Young-Im
- Subjects
- *
ARTIFICIAL neural networks , *CONVOLUTIONAL neural networks , *FEATURE extraction , *EMERGENCY management , *IMAGE processing , *DEEP learning , *FIRE detectors - Abstract
The rapid and accurate detection of fire scenes in various environments is crucial for effective disaster management and mitigation. Fire scene classification is a critical aspect of modern fire detection systems that directly affects public safety and property preservation. This research introduced a novel hybrid deep learning model designed to enhance the accuracy and efficiency of fire scene classification across diverse environments. The proposed model integrates advanced convolutional neural networks with multiscale feature extraction, attention mechanisms, and ensemble learning to achieve superior performance in real-time fire detection. By leveraging the strengths of pre-trained networks such as ResNet50, VGG16, and EfficientNet-B3, the model captures detailed features at multiple scales, ensuring robust detection capabilities. Including spatial and channel attention mechanisms further refines the focus on critical areas within the input images, reducing false positives and improving detection precision. Extensive experiments on a comprehensive dataset encompassing wildfires, building fires, vehicle fires, and non-fire scenes demonstrate that the proposed framework outperforms existing cutting-edge techniques. The model also exhibited reduced computational complexity and enhanced inference speed, making it suitable for deployment in real-time applications on various hardware platforms. This study sets a new benchmark for fire detection and offers a powerful tool for early warning systems and emergency response initiatives. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
44. Recognition of Diabetic Retinopathy Grades Based on Data Augmentation and Attention Mechanisms.
- Author
-
Li, Xueri, Wen, Li, Du, Fanyu, Yang, Lei, and Wu, Jianfang
- Subjects
- *
CONVOLUTIONAL neural networks , *DIABETIC retinopathy , *DEEP learning , *DATA augmentation , *DIABETES complications - Abstract
Diabetic retinopathy is a complication of diabetes and one of the leading causes of vision loss. Early detection and treatment are essential to prevent vision loss. Deep learning has been making great strides in the field of medical image processing and can be used as an aid for medical practitioners. However, unbalanced datasets, sparse focal areas, small differences between adjacent disease grades, and varied manifestations of the same grade disease challenge deep learning model training. Generalization performance and robustness are inadequate. To address the problem of unbalanced sample numbers between classes in the dataset, this work proposes using VQ‐VAE for reconstructing affine transformed images to enrich and balance the dataset. Test results show the model's average reconstruction error is 0.0001, and the mean structural similarity between reconstructed and original images is 0.967. This proves reconstructed images differ from originals yet belong to the same category, expanding and diversifying the dataset. Addressing the issues of focal area sparsity and disease grade disparity, this work utilizes ResNeXt50 as the backbone network and constructs diverse attention networks by modifying the network structure and embedding different attention modules. Experiments demonstrate that the convolutional attention network outperforms ResNeXt50 in terms of Precision, Sensitivity, Specificity, F1 Score, Quadratic Weighted Kappa Coefficient, Accuracy, and robustness against Salt and Pepper noise, Gaussian noise, and gradient perturbation. Finally, the heat maps of each model recognizing the fundus image were plotted using the Grad‐CAM method. The heat maps show that the attentional network is more effective than the non‐attentional network ResNeXt50 at attending to the fundus image. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. SGW-YOLOv8n: An Improved YOLOv8n-Based Model for Apple Detection and Segmentation in Complex Orchard Environments.
- Author
-
Wu, Tao, Miao, Zhonghua, Huang, Wenlei, Han, Wenkai, Guo, Zhengwei, and Li, Tao
- Subjects
APPLE orchards ,FRUIT harvesting ,DEEP learning ,MODEL validation ,FRUIT - Abstract
This study addresses the problem of detecting occluded apples in complex unstructured environments in orchards and proposes an apple detection and segmentation model based on improved YOLOv8n-SGW-YOLOv8n. The model improves apple detection and segmentation by combining the SPD-Conv convolution module, the GAM global attention mechanism, and the Wise-IoU loss function, which enhances the accuracy and robustness. The SPD-Conv module preserves fine-grained features in the image by converting spatial information into channel information, which is particularly suitable for small target detection. The GAM global attention mechanism enhances the recognition of occluded targets by strengthening the feature representation of channel and spatial dimensions. The Wise-IoU loss function further optimises the regression accuracy of the target frame. Finally, the pre-prepared dataset is used for model training and validation. The results show that the SGW-YOLOv8n model significantly improves relative to the original YOLOv8n in target detection and instance segmentation tasks, especially in occlusion scenes. The model improves the detection mAP to 75.9% and the segmentation mAP to 75.7% and maintains a processing speed of 44.37 FPS, which can meet the real-time requirements, providing effective technical support for the detection and segmentation of fruits in complex unstructured environments for fruit harvesting robots. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
46. MAPM: multiscale attention pre-training model for TextVQA.
- Author
-
Yang, Yue, Yu, Yue, and Li, Yingying
- Subjects
LANGUAGE models ,QUESTION answering systems - Abstract
Text Visual Question Answering (TextVQA) task aims to enable models to read and answer questions based on images with text. Existing attention-based methods for TextVQA tasks often face challenges in effectively aligning local features between modalities during multimodal information interaction. This misalignment hinders their performance in accurately answering questions based on images with text. To address this issue, the Multiscale Attention Pre-training Model (MAPM) is proposed to enhance multimodal feature fusion. MAPM introduces the multiscale attention modules, which facilitate finegrained local feature enhancement and global feature fusion across modalities. By adopting these modules, MAPM achieves superior performance in aligning and integrating visual and textual information. Additionally, MAPM benefits from being pre-trained with scene text, employing three pre-training tasks: masked language model, visual region matching, and OCR visual text matching. This pre-training process establishes effective semantic alignment relationships among different modalities. Experimental evaluations demonstrate the superiority of MAPM, achieving a 1.2% higher accuracy compared to state-of-the-art models on the TextVQA dataset, especially when handling numerical data within images. Multiscale Attention Pre-training Model (MAPM) is proposed to enhance local fine-grained features (Joint Attention Module) and effectively addresses redundancy in global features (Global Attention Module) in text VQA task. Three pre-training tasks are designed to enhance the model's expressive power and address the issue of cross modal semantic alignment [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
47. Advancing Ton-Bag Detection in Seaport Logistics with an Enhanced YOLOv8 Algorithm.
- Author
-
Qiu, Xiulin, Zhang, Haozhi, Yuan, Chang, Liu, Qinghua, and Yao, Hongzhi
- Subjects
FREIGHT & freightage ,MARINE terminals ,BOOTSTRAP aggregation (Algorithms) ,HARBORS ,BLOCK designs ,TRANSPORTATION costs - Abstract
Intelligent logistics and freight transportation is an important part of realizing the intelligence of port terminals. Due to the problems of inaccurate ton bag identification, high costs, large model sizes, and long computation times in traditional freight transportation—issues that hinder meeting real-time requirements on resource-constrained operational equipment—this paper proposes an improved lightweight ton bag detection algorithm, YOLOv8-TB (YOLOv8-Ton Bag), which is optimized based on YOLOv8. Firstly, the improved LZKAC module is introduced to combine with SPPF to form a new SPPFLKZ module, which improves the feature expression performance. Then, with reference to spatial and channel reconstruction convolution and deformable convolution, the C2f-SCTT block is designed for the backbone network, which reduces the spatial and channel redundancy between features in the network. Finally, the C2f-ORECZ block based on a linear scaling layer is designed for the neck, which reduces the training overhead and strengthens the feature learning of the feature extraction network for the targets in the complex background of the harbor and adds the 160 × 160 scale detection head to strengthen small target detection abilities. On the logistics ton bag operation dataset provided by shipping port enterprises, the improved algorithm improves by 3.7% and 5% compared with the original algorithm in mAP50 and mAP50-95, respectively, the model size is reduced by 4.42 MB and the amount of model computation is only 8 G, which is capable of accurately detecting logistics ton bags in real time. The superiority of the method is verified by comparing it with other classical target detection algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
48. Enhanced multi-view anomaly detection on attribute networks by truncated singular value decomposition.
- Author
-
Lee, Baozhen, Su, Yuwei, Kong, Qianwen, and Zhang, Tingting
- Abstract
In the field of attribute network anomaly detection, current research methodologies, such as reconstruction and contrastive learning, frequently face challenges including the minimal differentiation in embedding representations of normal and anomalous nodes, an excessive dependence on local information, and a susceptibility to noise from adjacent nodes. To overcome these limitations, this paper presents a novel approach: the Enhanced Multi-view Anomaly Detection on Attribute Networks by Truncated Singular Value Decomposition (EMTSVD) method. EMTSVD leverages TSVD to generate improved views of both attributes and structures. Through the use of a low-rank approximation matrix, EMTSVD effectively filters out noise and isolates critical structural and attribute information. This isolated information is subsequently incorporated into the node embedding representations, significantly enhancing the differentiation between normal and anomalous nodes. Moreover, EMTSVD employs an attention mechanism to integrate multiple views, effectively minimizing spatial feature redundancy and further diminishing the effects of noise disturbances. Empirical evidence highlights EMTSVD's adeptness at accurately identifying essential node information within networks. By bolstering the distinction in embedding representations between normal and anomalous nodes, EMTSVD markedly advances the precision of anomaly detection in attribute networks. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. RESEARCH ON KNOWLEDGE DISCOVERY AND SHARING IN AIGC VIRTUAL TEACHING AND RESEARCH ROOM EMPOWERED BY BIG DATA ANALYSIS AND NATURAL LANGUAGE PROCESSING ALGORITHMS.
- Author
-
LINGLING LI, PEIGANG WANG, and XUEBIAO NIU
- Subjects
DEEP reinforcement learning ,REINFORCEMENT learning ,VIRTUAL reality ,STUDENT engagement ,ONLINE education - Abstract
This paper introduces a pioneering framework named Deep Reinforcement Learning based AI-Generated Content for Virtual Teaching (DRL-AIGC-VR), which aims to transform the landscape of online education and research. At the heart of this system is the integration of Deep Reinforcement Learning (DRL) and Natural Language Processing (NLP), making it exceptionally suited for the dynamic and evolving environment of virtual teaching and research rooms. The uniqueness of DRLAIGC-VR lies in its adaptive content curation and presentation capabilities, achieved through a combination of Deep Q-Networks (DQN) with attention mechanisms. This innovative approach allows the system to personalize learning experiences by tailoring them to individual student performance and engagement levels. Simultaneously, it focuses on presenting the most pertinent information, thereby streamlining and optimizing the learning process. One of the most significant features of this system is its ability to handle and analyze large-scale educational data, a vital aspect in today's big data-driven world. This capability ensures that DRL-AIGCVR offers a highly interactive, responsive, and efficient learning environment, addressing the varied requirements of students and researchers. The implementation of DRL-AIGC-VR in virtual educational settings has shown remarkable improvements in several key areas, including learning outcomes, student engagement, and knowledge retention. These enhancements are indicative of the substantial progress that the framework brings to the domain of virtual education, positioning it as a leading solution in the realm of AI-driven learning platforms. Overall, DRL-AIGC-VR represents a significant step forward in harnessing the power of AI to enrich and elevate the educational experience in virtual settings, paving the way for more advanced, personalized, and effective online learning and research methodologies. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. MTAF–DTA: multi-type attention fusion network for drug–target affinity prediction
- Author
-
Jinghong Sun, Han Wang, Jia Mi, Jing Wan, and Jingyang Gao
- Subjects
Drug–target binding affinity ,Multi-modal features ,Attention mechanisms ,Nested fusion networks ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background The development of drug–target binding affinity (DTA) prediction tasks significantly drives the drug discovery process forward. Leveraging the rapid advancement of artificial intelligence, DTA prediction tasks have undergone a transformative shift from wet lab experimentation to machine learning-based prediction. This transition enables a more expedient exploration of potential interactions between drugs and targets, leading to substantial savings in time and funding resources. However, existing methods still face several challenges, such as drug information loss, lack of calculation of the contribution of each modality, and lack of simulation regarding the drug–target binding mechanisms. Results We propose MTAF–DTA, a method for drug–target binding affinity prediction to solve the above problems. The drug representation module extracts three modalities of features from drugs and uses an attention mechanism to update their respective contribution weights. Additionally, we design a Spiral-Attention Block (SAB) as drug–target feature fusion module based on multi-type attention mechanisms, facilitating a triple fusion process between them. The SAB, to some extent, simulates the interactions between drugs and targets, thereby enabling outstanding performance in the DTA task. Our regression task on the Davis and KIBA datasets demonstrates the predictive capability of MTAF–DTA, with CI and MSE metrics showing respective improvements of 1.1% and 9.2% over the state-of-the-art (SOTA) method in the novel target settings. Furthermore, downstream tasks further validate MTAF–DTA’s superiority in DTA prediction. Conclusions Experimental results and case study demonstrate the superior performance of our approach in DTA prediction tasks, showing its potential in practical applications such as drug discovery and disease treatment.
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.