Back to Search Start Over

Construction of language models for an handwritten mail reading system

Authors :
Laurence Likforman-Sulem
Olivier Morillot
Emmanuèle Grosicki
Laboratoire Traitement et Communication de l'Information (LTCI)
Télécom ParisTech-Institut Mines-Télécom [Paris] (IMT)-Centre National de la Recherche Scientifique (CNRS)
CEP Arcueil (DGA/CTA/DT/GIP)
Délégation Générale pour l'Armement
Source :
DRR, Document Recognition and Retrieval XIX, IS&T/SPIE 24th Annual Symposium on Electronic Imaging-Document Recognition and Retrieval XIX, IS&T/SPIE 24th Annual Symposium on Electronic Imaging-Document Recognition and Retrieval XIX, Jan 2012, San Francisco, United States. ⟨10.1117/12.911965⟩
Publication Year :
2012
Publisher :
SPIE, 2012.

Abstract

International audience; This paper presents a system for the recognition of unconstrained handwritten mails. The main part of this system is an HMM recognizer which uses trigraphs to model contextual information. This recognition system does not require any segmentation into words or characters and directly works at line level. To take into account linguistic information and enhance performance, a language model is introduced. This language model is based on bigrams and built from training document transcriptions only. Different experiments with various vocabulary sizes and language models have been conducted. Word Error Rate and Perplexity values are compared to show the interest of specific language models, fit to handwritten mail recognition task.

Details

ISSN :
0277786X
Database :
OpenAIRE
Journal :
SPIE Proceedings
Accession number :
edsair.doi.dedup.....f0a3778ba009fc856061d9c4bbdfa474
Full Text :
https://doi.org/10.1117/12.911965