Back to Search Start Over

Utilização de informações lexicais extraídas automaticamente de corpora na análise sintática computacional do português.

Authors :
Alencar, Leonel Figueiredo De
Source :
Revista de Estudos da Linguagem. jan-jun2011, Vol. 19 Issue 1, p7-85. 79p.
Publication Year :
2011

Abstract

Lexicon modeling is the main difficulty to overcome when building deep syntactic parsers for unrestricted text. Traditionally, two strategies have been used for tackling lexical information in the domain of unrestricted syntactic parsing: compiling thousands of lexical entries or formulating hundreds of morphological rules. Due to productive word-formation processes, proper names, and non-standard spellings, the former strategy, resorted to by freely downloadable parsers for Brazilian Portuguese (BP), is not robust. On the other hand, deploying the latter is a time-intensive and non-trivial knowledge engineering task. At present, there is no open-source licensed wide-coverage parser for BP. Aiming at filling this gap as soon as possible, we argue in this paper that a much less expensive and much more efficient solution to the lexicon bottleneck in parsing is to simply reuse freely available morphosyntactic taggers as the system's lexical analyzer. Besides, thanks to the free and broad availability of POS-tagged corpora for BP and efficient machine learning packages, building additional high accurate taggers has become an almost effortless task. In order to easily integrate the output of taggers constructed in different architectures into context-free grammar chart parsers compiled with the Natural Language Toolkit (NLTK), we have developed a Python module named ALEXP. To the best of our knowledge, this is the first free software specially optimized for processing Portuguese to accomplish such a task. The tool's functionality is described by means of BP grammar prototypes applied to parsing real-world sentences, with very promising results. [ABSTRACT FROM AUTHOR]

Details

Language :
Portuguese
ISSN :
01040588
Volume :
19
Issue :
1
Database :
Academic Search Index
Journal :
Revista de Estudos da Linguagem
Publication Type :
Academic Journal
Accession number :
70459835
Full Text :
https://doi.org/10.17851/2237-2083.19.1.7-85