Back to Search Start Over

Des dictionnaires éditoriaux aux représentations XML standardisées

Authors :
Mathieu Mangeot
Chantal Enguehard
Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole (GETALP)
Laboratoire d'Informatique de Grenoble (LIG)
Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)
Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)
Université Savoie Mont Blanc (USMB [Université de Savoie] [Université de Chambéry])
Laboratoire d'Informatique de Nantes Atlantique (LINA)
Centre National de la Recherche Scientifique (CNRS)-Mines Nantes (Mines Nantes)-Université de Nantes (UN)
Gala
Nuria and Zock
Michael
Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP)-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP)-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)
Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP)-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)
Mines Nantes (Mines Nantes)-Université de Nantes (UN)-Centre National de la Recherche Scientifique (CNRS)
Source :
Ressources Lexicales : contenu, construction, utilisation, évaluation, Gala, Nuria and Zock, Michael. Ressources Lexicales : contenu, construction, utilisation, évaluation, John Benjamins, pp.24, 2013, ⟨10.1075/lis.30.08man⟩
Publication Year :
2013
Publisher :
HAL CCSD, 2013.

Abstract

International audience; Create an electronic dictionary from scratch is an expensive job because this task mobilizes over a long period, the work of skilled contributors, if not in lexicology, at least in linguistics. The use of specialized computer tools is essential for resources used by programs in natural language processing. When the socio-economic environment does not gather the necessary resources to the drafting of an electronic dictionary and printed dictionaries exist, these dictionaries are an important resource that can be used to initialize the creation of electronic lexical resources. This paper presents theoretical and practical aspects concerning the conversion of publishing dictionaries to electronic lexical resources. It takes into account the issue of limited economic resources, technology and the availability of qualified persons. Our field experiments concerns under-resourced languages mainly in Southeast Asia (Khmer, Malay, Vietnamese) and the Sahel (Bambara, Hausa, Kanuri, Tamajaq, Zarma), as most of the examples and socio-linguistic situations described in the paper relate to these areas. After a brief history devoted to the formats of electronic dictionaries (SGML, XML, XSLT and CSS), we present two standards that are dedicated to them (Text Encoding Initiative and Lexical Markup Framework). The issue of under-resourced languages is exposed and is followed by some examples concerning published dictionaries. The main technical challenges are detailed like the lack of standardization of the alphabets used and special characters (outside the traditional latin range). The conversion methodology is outlined and then detailed. The conversion to a bridge format in XML can be done by regular expressions or using specialized tools. Then, the bridge format is converted into the target format in LMF. The last part is dedicated to the consultation of resources through an online platform resource management.

Details

Language :
French
Database :
OpenAIRE
Journal :
Ressources Lexicales : contenu, construction, utilisation, évaluation, Gala, Nuria and Zock, Michael. Ressources Lexicales : contenu, construction, utilisation, évaluation, John Benjamins, pp.24, 2013, ⟨10.1075/lis.30.08man⟩
Accession number :
edsair.doi.dedup.....8edcbcdfda696560a8b5221976537268