Back to Search
Start Over
Ground-truth generation through crowdsourcing with probabilistic indexes.
- Source :
-
Neural Computing & Applications . Oct2024, Vol. 36 Issue 30, p18879-18895. 17p. - Publication Year :
- 2024
-
Abstract
- Automatic transcription of large series of historical handwritten documents generally aims at allowing to search for textual information in these documents. However, automatic transcripts often lack the level of accuracy needed for reliable text indexing and search purposes. Probabilistic Indexing (PrIx) offers a unique alternative to raw transcripts. Since it needs training data to achieve good search performance, PrIx-based crowdsourcing techniques are introduced in this paper to gather the required data. In the proposed approach, PrIx confidence measures are used to drive a correction process in which users can amend errors and possibly add missing text. In a further step, corrected data are used to retrain the PrIx models. Results on five large series are reported which show consistent improvements after retraining. However, it can be argued whether the overall costs of the crowdsourcing operation pay off for the improvements, or perhaps it would have been more cost-effective to just start with a larger and cleaner amount of professionally produced training transcripts. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 09410643
- Volume :
- 36
- Issue :
- 30
- Database :
- Academic Search Index
- Journal :
- Neural Computing & Applications
- Publication Type :
- Academic Journal
- Accession number :
- 179738894
- Full Text :
- https://doi.org/10.1007/s00521-024-10188-0