Author: "Lorraine Goeuriot" / Publisher: association for computational linguistics - Searchworks@Jio Institute Digital Library Search Results

1. Compilation of specialized comparable corpora in French and Japanese

Author: Emmanuel Morin, Lorraine Goeuriot, and Béatrice Daille
Subjects: Typology, Shallow parsing, Information retrieval, business.industry, Computer science, media_common.quotation_subject, Comparability, computer.software_genre, Domain (software engineering), Scientific domain, Quality (business), Artificial intelligence, IBM, business, Popular science, computer, Natural language processing, media_common
Abstract: We present in this paper the development of a specialized comparable corpora compilation tool, for which quality would be close to a manually compiled corpus. The comparability is based on three levels: domain, topic and type of discourse. Domain and topic can be filtered with the keywords used through web search. But the detection of the type of discourse needs a wide linguistic analysis. The first step of our work is to automate the detection of the type of discourse that can be found in a scientific domain (science and popular science) in French and Japanese languages. First, a contrastive stylistic analysis of the two types of discourse is done on both languages. This analysis leads to the creation of a reusable, generic and robust typology. Machine learning algorithms are then applied to the typology, using shallow parsing. We obtain good results, with an average precision of 80% and an average recall of 70% that demonstrate the efficiency of this typology. This classification tool is then inserted in a corpus compilation tool which is a text collection treatment chain realized through IBM UIMA system. Starting from two specialized web documents collection in French and Japanese, this tool creates the corresponding corpus.
Published: 2009
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

1 results on '"Lorraine Goeuriot"'

1. Compilation of specialized comparable corpora in French and Japanese

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Publication Year Range

Language

Database

1 results on '"Lorraine Goeuriot"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources