Back to Search
Start Over
Automatic CNN-based Arabic numeral spotting and handwritten digit recognition by using deep transfer learning in Ottoman population registers
- Source :
- Applied Sciences, Volume 10, Issue 16, Applied Sciences, Vol 10, Iss 5430, p 5430 (2020)
- Publication Year :
- 2020
- Publisher :
- Multidisciplinary Digital Publishing Institute (MDPI), 2020.
-
Abstract
- Historical manuscripts and archival documentation are handwritten texts which are the backbone sources for historical inquiry. Recent developments in the digital humanities field and the need for extracting information from the historical documents have fastened the digitization processes. Cutting edge machine learning methods are applied to extract meaning from these documents. Page segmentation (layout analysis), keyword, number and symbol spotting, handwritten text recognition algorithms are tested on historical documents. For most of the languages, these techniques are widely studied and high performance techniques are developed. However, the properties of Arabic scripts (i.e., diacritics, varying script styles, diacritics, and ligatures) create additional problems for these algorithms and, therefore, the number of research is limited. In this research, we first automatically spotted the Arabic numerals from the very first series of population registers of the Ottoman Empire conducted in the mid-nineteenth century and recognized these numbers. They are important because they held information about the number of households, registered individuals and ages of individuals. We applied a red color filter to separate numerals from the document by taking advantage of the structure of the studied registers (numerals are written in red). We first used a CNN-based segmentation method for spotting these numerals. In the second part, we annotated a local Arabic handwritten digit dataset from the spotted numerals by selecting uni-digit ones and tested the Deep Transfer Learning method from large open Arabic handwritten digit datasets for digit recognition. We achieved promising results for recognizing digits in these historical documents.<br />European Research Council (ERC) Project: "Industrialisation and Urban Growth from the mid-nineteenth century Ottoman Empire to Contemporary Turkey in a Comparative Perspective, 1850-2000" (; UrbanOccupationsOETR); European Union (European Union); Horizon 2020
- Subjects :
- Computer science
Population
02 engineering and technology
computer.software_genre
Convolutional neural network
lcsh:Technology
Arabic numerals
Numeral system
lcsh:Chemistry
convolutional neural networks
deep transfer learning
0202 electrical engineering, electronic engineering, information engineering
General Materials Science
education
Instrumentation
lcsh:QH301-705.5
Digitization
Fluid Flow and Transfer Processes
education.field_of_study
business.industry
lcsh:T
Process Chemistry and Technology
General Engineering
numeral spotting
020206 networking & telecommunications
Chemistry
Engineering
Materials science
Physics
Spotting
Numerical digit
lcsh:QC1-999
Computer Science Applications
Convolutional neural networks
Deep transfer learning
Handwritten digit recognition
Historical document analysis
Numeral spotting
lcsh:Biology (General)
lcsh:QD1-999
lcsh:TA1-2040
ComputingMethodologies_DOCUMENTANDTEXTPROCESSING
020201 artificial intelligence & image processing
Artificial intelligence
handwritten digit recognition
Transfer of learning
business
lcsh:Engineering (General). Civil engineering (General)
computer
Natural language processing
lcsh:Physics
historical document analysis
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Journal :
- Applied Sciences, Volume 10, Issue 16, Applied Sciences, Vol 10, Iss 5430, p 5430 (2020)
- Accession number :
- edsair.doi.dedup.....3bb71010da6f0f41ada65ee639f5f58d