Back to Search
Start Over
The Impact of an Attention Mechanism on the Representations in Neural Networks, Focusing on Catastrophic Forgetting and Robustness to Input Noise
- Publication Year :
- 2024
-
Abstract
- This study explores how attention mechanisms impact representation distributions within neural networks, focusing on catastrophic forgetting and robustness to input noise. We compare Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), Gated Recurrent Units (GRU), their attention-enhanced counterparts (RNNA, LSTMA, GRUA), and the Transformer model using musical sequences from "Daisy Bell". A key finding is the difference in how these models distribute the information in their representation. Base models like RNN, LSTM, and GRU concentrate information within specific nodes, while attention-enhanced models spread information across more nodes, demonstrating greater robustness to input noise. This is shown by significant differences in performance deterioration between base models and their attention-augmented versions. However, base models such as RNN and GRU exhibit better resistance to catastrophic forgetting compared to their attention-enhanced counterparts. Despite this, attention models show a positive correlation between higher overlap percentages in their representations and improved accuracy for certain tasks, alongside a negative correlation with higher numbers of empty nodes. The Transformer model stands out by maintaining high accuracy across tasks, likely due to its self-attention mechanisms. These results suggest that while attention mechanisms enhance robustness to noise, further research is needed to address catastrophic forgetting in neural networks.
Details
- Database :
- OAIster
- Notes :
- application/pdf, English
- Publication Type :
- Electronic Resource
- Accession number :
- edsoai.on1457631667
- Document Type :
- Electronic Resource