Author: "Lefever, Els" / Journal: terminology - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Lefever, Els"' showing total 5 results

Start Over Author "Lefever, Els" Journal terminology

5 results on '"Lefever, Els"'

1. Tagging terms in text: A supervised sequential labelling approach to automatic term extraction.

Author: Rigouts Terryn, Ayla, Hoste, Véronique, and Lefever, Els
Subjects: NATURAL language processing, RECURRENT neural networks, MACHINE learning, RANDOM fields, CONDITIONALS (Logic)
Abstract: As with many tasks in natural language processing, automatic term extraction (ATE) is increasingly approached as a machine learning problem. So far, most machine learning approaches to ATE broadly follow the traditional hybrid methodology, by first extracting a list of unique candidate terms, and classifying these candidates based on the predicted probability that they are valid terms. However, with the rise of neural networks and word embeddings, the next development in ATE might be towards sequential approaches, i.e., classifying each occurrence of each token within its original context. To test the validity of such approaches for ATE, two sequential methodologies were developed, evaluated, and compared: one feature-based conditional random fields classifier and one embedding-based recurrent neural network. An additional comparison was added with a machine learning interpretation of the traditional approach. All systems were trained and evaluated on identical data in multiple languages and domains to identify their respective strengths and weaknesses. The sequential methodologies were proven to be valid approaches to ATE, and the neural network even outperformed the more traditional approach. Interestingly, a combination of multiple approaches can outperform all of them separately, showing new ways to push the state-of-the-art in ATE. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

2. HAMLET: Hybrid Adaptable Machine Learning approach to Extract Terminology.

Author: Rigouts Terryn, Ayla, Hoste, Véronique, and Lefever, Els
Subjects: MACHINE learning, SUPERVISED learning, NATURAL language processing, TERMS & phrases, RANDOM forest algorithms
Abstract: Automatic term extraction (ATE) is an important task within natural language processing, both separately, and as a preprocessing step for other tasks. In recent years, research has moved far beyond the traditional hybrid approach where candidate terms are extracted based on part-of-speech patterns and filtered and sorted with statistical termhood and unithood measures. While there has been an explosion of different types of features and algorithms, including machine learning methodologies, some of the fundamental problems remain unsolved, such as the ambiguous nature of the concept "term". This has been a hurdle in the creation of data for ATE, meaning that datasets for both training and testing are scarce, and system evaluations are often limited and rarely cover multiple languages and domains. The ACTER Annotated Corpora for Term Extraction Research contain manual term annotations in four domains and three languages and have been used to investigate a supervised machine learning approach for ATE, using a binary random forest classifier with multiple types of features. The resulting system (HAMLET Hybrid Adaptable Machine Learning approach to Extract Terminology) provides detailed insights into its strengths and weaknesses. It highlights a certain unpredictability as an important drawback of machine learning methodologies, but also shows how the system appears to have learnt a robust definition of terms, producing results that are state-of-the-art, and contain few errors that are not (part of) terms in any way. Both the amount and the relevance of the training data have a substantial effect on results, and by varying the training data, it appears to be possible to adapt the system to various desired outputs, e.g., different types of terms. While certain issues remain difficult – such as the extraction of rare terms and multiword terms – this study shows how supervised machine learning is a promising methodology for ATE. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

3. HypoTerm.

Author: Lefever, Els, Van de Kauter, Marjan, and Hoste, Véronique
Subjects: *SEMANTICS, *TERMS & phrases, *COMPARATIVE linguistics, *INFORMATION theory, *DUTCH language
Abstract: HypoTerm is a data-driven semantic relation finder that starts from a list of automatically extracted domain- and user-specific terms from technical corpora, and generates a list of relations between these terms. This research study focused on the detection of hypernym relations between relevant terms and named entities. In order to detect all relevant hypernym relations in technical texts, we combined a lexico-syntactic pattern-based approach and a morpho-syntactic analyzer. To evaluate our relation finder, we constructed and manually annotated gold standard data for the dredging and financial domain in Dutch and English. The experimental results show that the HypoTerm system achieves high precision and recall figures for technical texts when starting from valid domain-specific terms and named entities. Thanks to this data-driven approach, it is possible to take an important step from terminology to concept extraction without using any external lexico-semantic resources. [ABSTRACT FROM AUTHOR]
Published: 2014
Full Text: View/download PDF

4. TExSIS: Bilingual terminology extraction from parallel corpora using chunk-based alignment.

Author: Macken, Lieve, Lefever, Els, and Hoste, Veronique
Subjects: *BILINGUALISM, *TERMS & phrases, *EXTRACTION (Linguistics), *FRENCH language, *ITALIAN language, *ENGLISH language, *DUTCH language
Abstract: We report on TExSIS, a flexible bilingual terminology extraction system that uses a sophisticated chunk-based alignment method for the generation of candidate terms, after which the specificity of the candidate terms is determined by combining several statistical filters. Although the set-up of the architecture is largely language-independent, we present terminology extraction results for four different languages and three language pairs. Gold standard data sets were created for French-Italian, French-English and French-Dutch, which allowed us not only to evaluate precision, which is common practice, but also recall. We compared the TExSIS approach, which takes a multilingual perspective from the start, with the more commonly used approach of first identifying term candidates monolingually and then aligning the source and target terms. A comparison of our system with the LUIZ approach described by Vintar (2010) reveals that TExSIS outperforms LUIZ both for monolingual and bilingual terminology extraction. Our results also clearly show that the precision of the alignment is crucial for the success of the terminology extraction. Furthermore, based on the observation that the precision scores for bilingual terminology extraction outperform those of the monolingual systems, we conclude that multilingual evidence helps to determine unithood in less related languages. [ABSTRACT FROM AUTHOR]
Published: 2013
Full Text: View/download PDF

5. Classification-based scientific term detection in patient information.

Author: Hoste, Véronique, Vanopstal, Klaar, Lefever, Els, and Delaere, Isabelle
Subjects: ORTHOGRAPHY & spelling, READABILITY (Literary style), TERMS & phrases, NAMES, DRUG information materials, MORPHOLOGY
Abstract: Although intended for the “average layman”, both in terms of readability and contents, the current patient information still contains many scientific terms. Different studies have concluded that the use of scientific terminology is one of the factors, which greatly influences the readability of this patient information. The present study deals with the problem of automatic term recognition of overly scientific terminology as a first step towards the replacement of the recognized scientific terms by their popular counterpart. In order to do so, we experimented with two approaches, a dictionary-based approach and a learning-based approach, which is trained on a rich feature vector. The research was conducted on a bilingual corpus of English and Dutch EPARs (European Public Assessment Report). Our results show that we can extract scientific terms with a high accuracy (> 80%, 10% below human performance) for both languages. Furthermore, we show that a lexicon-independent approach, which solely relies on orthographical and morphological information is the most powerful predictor of the scientific character of a given term. [ABSTRACT FROM AUTHOR]
Published: 2010
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

5 results on '"Lefever, Els"'

1. Tagging terms in text: A supervised sequential labelling approach to automatic term extraction.

2. HAMLET: Hybrid Adaptable Machine Learning approach to Extract Terminology.

3. HypoTerm.

4. TExSIS: Bilingual terminology extraction from parallel corpora using chunk-based alignment.

5. Classification-based scientific term detection in patient information.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

5 results on '"Lefever, Els"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources