Back to Search Start Over

DeepLontar dataset for handwritten Balinese character detection and syllable recognition on Lontar manuscript.

Authors :
Siahaan, Daniel
Sutramiani, Ni Putu
Suciati, Nanik
Duija, I Nengah
Darma, I Wayan Agus Surya
Source :
Scientific Data; 12/10/2022, Vol. 9 Issue 1, p1-7, 7p
Publication Year :
2022

Abstract

The digitalization of traditional Palmyra manuscripts, such as Lontar, is the government's main focus in efforts to preserve Balinese culture. Digitization is done by acquiring Lontar manuscripts through photos or scans. To understand Lontar's contents, experts usually carry out transliteration. Automatic transliteration using computer vision is generally carried out in several stages: character detection, character recognition, syllable recognition, and word recognition. Many methods can be used for detection and recognition, but they need data to train and evaluate the resulting model. In compiling the dataset, the data needs to be processed and labelled. This paper presented data collection and building datasets for detection and recognition tasks. Lontar was collected from libraries at universities in Bali. Data generation was carried out to produce 400 augmented images from 200 Lontar original images to increase the variousness of data. Annotations were performed to label each character producing over 100,000 characters in 55 character classes. This dataset can be used to train and evaluate performance in character detection and syllable recognition of new manuscripts. Measurement(s) accuracy • Precision • Recall • F1-score Technology Type(s) Python Factor Type(s) bounding box • frequency • spatial location • dimensional size Sample Characteristic - Organism lontar • manuscript • glyphs Sample Characteristic - Environment balinese glyphs Sample Characteristic - Location Bali Province [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
20524463
Volume :
9
Issue :
1
Database :
Complementary Index
Journal :
Scientific Data
Publication Type :
Academic Journal
Accession number :
160703232
Full Text :
https://doi.org/10.1038/s41597-022-01867-5