Back to Search Start Over

Ground-truth generation through crowdsourcing with probabilistic indexes.

Authors :
Sánchez, Joan Andreu
Vidal, Enrique
Bosch, Vicente
Quirós, Lorenzo
Source :
Neural Computing & Applications. Oct2024, Vol. 36 Issue 30, p18879-18895. 17p.
Publication Year :
2024

Abstract

Automatic transcription of large series of historical handwritten documents generally aims at allowing to search for textual information in these documents. However, automatic transcripts often lack the level of accuracy needed for reliable text indexing and search purposes. Probabilistic Indexing (PrIx) offers a unique alternative to raw transcripts. Since it needs training data to achieve good search performance, PrIx-based crowdsourcing techniques are introduced in this paper to gather the required data. In the proposed approach, PrIx confidence measures are used to drive a correction process in which users can amend errors and possibly add missing text. In a further step, corrected data are used to retrain the PrIx models. Results on five large series are reported which show consistent improvements after retraining. However, it can be argued whether the overall costs of the crowdsourcing operation pay off for the improvements, or perhaps it would have been more cost-effective to just start with a larger and cleaner amount of professionally produced training transcripts. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09410643
Volume :
36
Issue :
30
Database :
Academic Search Index
Journal :
Neural Computing & Applications
Publication Type :
Academic Journal
Accession number :
179738894
Full Text :
https://doi.org/10.1007/s00521-024-10188-0