Back to Search
Start Over
Des dictionnaires éditoriaux aux représentations XML standardisées
- Source :
- Ressources Lexicales : contenu, construction, utilisation, évaluation, Gala, Nuria and Zock, Michael. Ressources Lexicales : contenu, construction, utilisation, évaluation, John Benjamins, pp.24, 2013, ⟨10.1075/lis.30.08man⟩
- Publication Year :
- 2013
- Publisher :
- HAL CCSD, 2013.
-
Abstract
- International audience; Create an electronic dictionary from scratch is an expensive job because this task mobilizes over a long period, the work of skilled contributors, if not in lexicology, at least in linguistics. The use of specialized computer tools is essential for resources used by programs in natural language processing. When the socio-economic environment does not gather the necessary resources to the drafting of an electronic dictionary and printed dictionaries exist, these dictionaries are an important resource that can be used to initialize the creation of electronic lexical resources. This paper presents theoretical and practical aspects concerning the conversion of publishing dictionaries to electronic lexical resources. It takes into account the issue of limited economic resources, technology and the availability of qualified persons. Our field experiments concerns under-resourced languages mainly in Southeast Asia (Khmer, Malay, Vietnamese) and the Sahel (Bambara, Hausa, Kanuri, Tamajaq, Zarma), as most of the examples and socio-linguistic situations described in the paper relate to these areas. After a brief history devoted to the formats of electronic dictionaries (SGML, XML, XSLT and CSS), we present two standards that are dedicated to them (Text Encoding Initiative and Lexical Markup Framework). The issue of under-resourced languages is exposed and is followed by some examples concerning published dictionaries. The main technical challenges are detailed like the lack of standardization of the alphabets used and special characters (outside the traditional latin range). The conversion methodology is outlined and then detailed. The conversion to a bridge format in XML can be done by regular expressions or using specialized tools. Then, the bridge format is converted into the target format in LMF. The last part is dedicated to the consultation of resources through an online platform resource management.
- Subjects :
- 060201 languages & linguistics
computer.internet_protocol
Computer science
Lexicology
06 humanities and the arts
02 engineering and technology
XSLT
computer.file_format
XML
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
World Wide Web
Resource (project management)
Electronic dictionary
0602 languages and literature
0202 electrical engineering, electronic engineering, information engineering
Lexical Markup Framework
LMF
020201 artificial intelligence & image processing
Resource management
SGML
computer
DiLAF
computer.programming_language
Subjects
Details
- Language :
- French
- Database :
- OpenAIRE
- Journal :
- Ressources Lexicales : contenu, construction, utilisation, évaluation, Gala, Nuria and Zock, Michael. Ressources Lexicales : contenu, construction, utilisation, évaluation, John Benjamins, pp.24, 2013, ⟨10.1075/lis.30.08man⟩
- Accession number :
- edsair.doi.dedup.....8edcbcdfda696560a8b5221976537268