Back to Search Start Over

STEF: a Swin Transformer-Based Enhanced Feature Pyramid Fusion Model for Dongba character detection

Authors :
Yuqi Ma
Shanxiong Chen
Yongbo Li
Jingliu He
Qiuyue Ruan
Wenjun Xiao
Hailing Xiong
XiaoLiang Li
Source :
Heritage Science, Vol 12, Iss 1, Pp 1-17 (2024)
Publication Year :
2024
Publisher :
SpringerOpen, 2024.

Abstract

Abstract The Dongba manuscripts are a unique primitive pictographic writing system that originated among the Naxi people of Lijiang, China, boasting over a thousand years of history. The uniqueness of the Dongba manuscripts stems from their pronounced pictorial and ideographic characteristics. However, the digital preservation and inheritance of Dongba manu manuscripts face multiple challenges, including extracting its rich semantic information, recognizing individual characters, retrieving Dongba manuscripts, and automatically interpreting the meanings of Dongba manuscripts. Developing efficient Dongba character detection technology has become a key research focus, wherein establishing a standardized Dongba detection dataset is crucial for training and evaluating techniques. In this study, we have created a comprehensive Dongba manuscripts detection dataset covering various commonly used Dongba characters and vocabularies. Additionally, we propose a model named STEF. Firstly, the Swin Transformer extracts the complex structures and diverse shapes of Dongba manuscripts’ features. Then, by introducing a Feature Pyramid Enhancement Module, features of different sizes are cascaded to preserve multi-scale information. Subsequently, all features are fused in a FUSION module, resulting in features of various Dongba manuscript styles. Each pixel’s binarisation threshold is dynamically adjusted through a differentiable binarisation operation, accurately distinguishing between foreground Dongba manuscripts and background. Lastly, deformable convolution is introduced, allowing the model to dynamically adjust the convolution kernel’s size and shape based on the Dongba manuscripts’ size, thereby better capturing the detailed information of Dongba characters of different sizes. Experimental results show that STEF achieves a recall rate of 88.88%, a precision rate of 88.65%, and an F-measure of 88.76%, outperforming other text detection algorithms. Visualization experiments demonstrate that STEF performs well in detecting Dongba manuscripts of various sizes, shapes, and styles, especially in blurred handwriting and complex backgrounds.

Details

Language :
English
ISSN :
20507445
Volume :
12
Issue :
1
Database :
Directory of Open Access Journals
Journal :
Heritage Science
Publication Type :
Academic Journal
Accession number :
edsdoj.9f75d6b711a9494db438f8af3edfa7d8
Document Type :
article
Full Text :
https://doi.org/10.1186/s40494-024-01321-2