3 results on '"text-line recognition"'
Search Results
2. BLSTM-based handwritten text recognition using Web resources
- Author
-
Laurence Likforman-Sulem, Cristina Oprean, Adrian Popescu, Chafic Mokbel, Institut Mines-Télécom [Paris] (IMT), Laboratoire Traitement et Communication de l'Information (LTCI), Institut Mines-Télécom [Paris] (IMT)-Télécom Paris, University of Balamand - UOB (LIBAN), Département Intelligence Ambiante et Systèmes Interactifs (DIASI), Laboratoire d'Intégration des Systèmes et des Technologies (LIST), Direction de Recherche Technologique (CEA) (DRT (CEA)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Direction de Recherche Technologique (CEA) (DRT (CEA)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay, University of Balamand [Liban] (UOB), and Laboratoire d'Intégration des Systèmes et des Technologies (LIST (CEA))
- Subjects
Handwriting recognition ,Computer science ,Speech recognition ,Character recognition ,Isolated word recognition ,Computational linguistics ,computer.software_genre ,Line segmentation ,Segmentation ,[INFO]Computer Science [cs] ,Text-line recognition ,Character (computing) ,business.industry ,Confidence score ,Dynamic dictionaries ,Hand-written text recognition ,World Wide Web ,Word recognition ,Language model ,Artificial intelligence ,Line (text file) ,Web resource ,Out of vocabulary words ,business ,computer ,Word (computer architecture) ,Natural language processing - Abstract
Conference of 13th International Conference on Document Analysis and Recognition, ICDAR 2015 ; Conference Date: 23 August 2015 Through 26 August 2015; Conference Code:118256; International audience; Handwriting recognition systems usually rely on static dictionaries and language models. Full coverage of these dictionaries is generally not achieved when dealing with unrestricted document corpora due to the presence of Out-Of-Vocabulary words. In a previous work, dynamic dictionaries were built from Web resources and successfully applied to isolated word recognition. In the present work we extend this approach to text-line recognition. Line segmentation into words is needed to exploit dynamic dictionaries and it is performed using BLSTM classifiers to align filler models and word sequence outputs. Words are then classified based on the confidence score into anchor and non-anchor words (AWs and NAWs). AWs are equated to the BLSTM outputs and used as such. Dynamic dictionaries are built for NAWs by exploiting Web resources for their character sequence and for neighboring AWs. Text-lines are decoded again using dynamic dictionaries and re-estimated language model. We conduct experiments on the publicly available RIMES database and show that the introduction of the dynamic dictionary is beneficial. Equally important, we show that the gain increases as the proportion of OOVs increases.
- Published
- 2015
- Full Text
- View/download PDF
3. Construction of language models for an handwritten mail reading system
- Author
-
Laurence Likforman-Sulem, Olivier Morillot, Emmanuèle Grosicki, Laboratoire Traitement et Communication de l'Information (LTCI), Télécom ParisTech-Institut Mines-Télécom [Paris] (IMT)-Centre National de la Recherche Scientifique (CNRS), CEP Arcueil (DGA/CTA/DT/GIP), and Délégation Générale pour l'Armement
- Subjects
language modeling ,Vocabulary ,Perplexity ,Computer science ,Bigram ,Speech recognition ,media_common.quotation_subject ,Word error rate ,computer.software_genre ,Intelligent word recognition ,Offline Handwriting recognition ,Rule-based machine translation ,Transcription (linguistics) ,handwritten mail ,Hidden Markov model ,Hidden Markov Models ,text-line recognition ,media_common ,business.industry ,n-gram ,[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing ,ComputingMethodologies_PATTERNRECOGNITION ,Cache language model ,Artificial intelligence ,Language model ,business ,computer ,Natural language processing - Abstract
International audience; This paper presents a system for the recognition of unconstrained handwritten mails. The main part of this system is an HMM recognizer which uses trigraphs to model contextual information. This recognition system does not require any segmentation into words or characters and directly works at line level. To take into account linguistic information and enhance performance, a language model is introduced. This language model is based on bigrams and built from training document transcriptions only. Different experiments with various vocabulary sizes and language models have been conducted. Word Error Rate and Perplexity values are compared to show the interest of specific language models, fit to handwritten mail recognition task.
- Published
- 2012
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.