Back to Search
Start Over
Tools for Semi-automatic Preparation of Training Data for OCR
- Source :
- IFIP Advances in Information and Communication Technology ISBN: 9783030198220, AIAI
- Publication Year :
- 2019
- Publisher :
- Springer International Publishing, 2019.
-
Abstract
- This work aims at data preparation for OCR systems based on recurrent neural networks. Precisely annotated data are necessary for training a network as well as for evaluation of OCR methods. It is possible to synthesize the data, however such data are not that realistic as the real ones. Manual annotation is thus still needed in many cases, especially in the case of historical documents we are focusing on. Although there are several complex systems for historical document processing, to the best of our knowledge, a simple annotation tool for OCR data is completely missing. Therefore, we propose and implement a set of tools utilizing artificial intelligence that simplify the annotation process. These tools create ground truths for line images that are used for training of nowadays OCR systems. Another contribution of this paper is making these tools freely available for research purposes.
- Subjects :
- Information retrieval
SIMPLE (military communications protocol)
Artificial neural network
Computer science
Process (engineering)
Complex system
02 engineering and technology
01 natural sciences
010309 optics
Set (abstract data type)
Annotation
Recurrent neural network
0103 physical sciences
0202 electrical engineering, electronic engineering, information engineering
020201 artificial intelligence & image processing
Historical document
Subjects
Details
- ISBN :
- 978-3-030-19822-0
- ISBNs :
- 9783030198220
- Database :
- OpenAIRE
- Journal :
- IFIP Advances in Information and Communication Technology ISBN: 9783030198220, AIAI
- Accession number :
- edsair.doi...........2e9dc8f0b6bdb7efdb7d8979caa7fb77