1. LA-ViT: A Network With Transformers Constrained by Learned-Parameter-Free Attention for Interpretable Grading in a New Laryngeal Histopathology Image Dataset
- Author
-
Huang, Pan, Xiao, Hualiang, He, Peng, Li, Chentao, Guo, Xiaodong, Tian, Sukun, Feng, Peng, Chen, Hu, Sun, Yuchun, Mercaldo, Francesco, Santone, Antonella, and Qin, Jing
- Abstract
Grading laryngeal squamous cell carcinoma (LSCC) based on histopathological images is a clinically significant yet challenging task. However, more low-effect background semantic information appeared in the feature maps, feature channels, and class activation maps, which caused a serious impact on the accuracy and interpretability of LSCC grading. While the traditional transformer block makes extensive use of parameter attention, the model overlearns the low-effect background semantic information, resulting in ineffectively reducing the proportion of background semantics. Therefore, we propose an end-to-end network with transformers constrained by learned-parameter-free attention (LA-ViT), which improve the ability to learn high-effect target semantic information and reduce the proportion of background semantics. Firstly, according to generalized linear model and probabilistic, we demonstrate that learned-parameter-free attention (LA) has a stronger ability to learn highly effective target semantic information than parameter attention. Secondly, the first-type LA transformer block of LA-ViT utilizes the feature map position subspace to realize the query. Then, it uses the feature channel subspace to realize the key, and adopts the average convergence to obtain a value. And those construct the LA mechanism. Thus, it reduces the proportion of background semantics in the feature maps and feature channels. Thirdly, the second-type LA transformer block of LA-ViT uses the model probability matrix information and decision level weight information to realize key and query, respectively. And those realize the LA mechanism. So, it reduces the proportion of background semantics in class activation maps. Finally, we build a new complex semantic LSCC pathology image dataset to address the problem, which is less research on LSCC grading models because of lacking clinically meaningful datasets. After extensive experiments, the whole metrics of LA-ViT outperform those of other state-of-the-art methods, and the visualization maps match better with the regions of interest in the pathologists' decision-making. Moreover, the experimental results conducted on a public LSCC pathology image dataset show that LA-ViT has superior generalization performance to that of other state-of-the-art methods.
- Published
- 2024
- Full Text
- View/download PDF