Author: "Hugo Gonçalo Oliveira" / Publisher: springer international publishing - Searchworks@Jio Institute Digital Library Search Results

1. Drilling Lexico-Semantic Knowledge in Portuguese from BERT

Author: Hugo Gonçalo Oliveira
Published: 2022
Full Text: View/download PDF

2. Answering Fill-in-the-Blank Questions in Portuguese with Transformer Language Models

Author: Hugo Gonçalo Oliveira
Subjects: Sequence, business.industry, Semantics (computer science), Computer science, computer.software_genre, Blank, language.human_language, language, Language modelling, Language model, Artificial intelligence, Portuguese, business, computer, Natural language processing, Sentence, Transformer (machine learning model)
Abstract: Despite different applications, transformer-based language models, like BERT and GPT, learn about language by predicting missing parts of text. BERT is pretrained in Masked Language Modelling and GPT generates text from a given sequence. We explore such models for answering cloze questions in Portuguese, following different approaches. When options are not considered, the largest BERT model, trained exclusively for Portuguese, is the most accurate. But when selecting the best option, top performance is achieved by computing the most probable sentence, and GPT-2 fine-tuned for Portuguese beats BERT.
Published: 2021
Full Text: View/download PDF

3. Exploring Portuguese Word Embeddings for Discovering Lexical-Semantic Relations

Author: Ana Alves, Hugo Gonçalo Oliveira, and Tiago Sousa
Subjects: business.industry, Computer science, 020206 networking & telecommunications, 02 engineering and technology, computer.software_genre, language.human_language, 0202 electrical engineering, electronic engineering, information engineering, language, 020201 artificial intelligence & image processing, Artificial intelligence, Portuguese, business, computer, Natural language processing, Word (computer architecture)
Abstract: Word2vec-like word embeddings are known for keeping linguistic regularities and thus good for solving analogies. Following this, we explore such embeddings for Portuguese in the discovery of lexical-semantic relations, which can be used for augmenting lexical-semantic knowledge bases. In this exploratory approach, we tested different methods for discovering relations of different types and confirm that word embeddings can be used, at least, for suggesting new candidate relations.
Published: 2020
Full Text: View/download PDF

4. Leveraging on Semantic Textual Similarity for Developing a Portuguese Dialogue System

Author: Ana Alves, José D. Santos, and Hugo Gonçalo Oliveira
Subjects: Information retrieval, Computer science, 02 engineering and technology, language.human_language, Set (abstract data type), Semantic similarity, Order (business), 020204 information systems, Similarity (psychology), 0202 electrical engineering, electronic engineering, information engineering, Question answering, language, 020201 artificial intelligence & image processing, Portuguese
Abstract: We describe an IR-based dialogue system that, in order to match user interactions with FAQs on a list, leverages on a model for computing the semantic similarity between two fragments of Portuguese text. It was mainly used for answering questions about the economic activity in Portugal and, when no FAQ has a higher score than a threshold, it may search for similar interactions in a corpus of movie subtitles and still tries to give a suitable response. Besides describing the underlying model and its integration, we assess it when answering variations of FAQs and report on an experiment to set the aforementioned threshold.
Published: 2020
Full Text: View/download PDF

5. The ASSIN 2 Shared Task: A Quick Overview

Author: Livy Real, Hugo Gonçalo Oliveira, and Erick Rocha Fonseca
Subjects: business.industry, Computer science, 02 engineering and technology, computer.software_genre, language.human_language, Task (project management), Natural language inference, 020204 information systems, Similarity (psychology), 0202 electrical engineering, electronic engineering, information engineering, language, 020201 artificial intelligence & image processing, Artificial intelligence, Portuguese, Textual entailment, business, computer, Natural language processing
Abstract: This paper offers a brief overview on the ASSIN 2, an evaluation shared task collocated with STIL 2019. ASSIN 2 covered two different but related tasks: Recognizing Textual Entailment (RTE), also known as Natural Language Inference (NLI), and Semantic Textual Similarity (STS). The ASSIN 2 collection was made of pairs of sentences annotated with human judgments for NLI and STS. Participating teams could take part in any of the tasks or both: nine teams participated in the STS task and eight in the NLI task.
Published: 2020
Full Text: View/download PDF

6. Exploring Emojis for Emotion Recognition in Portuguese Text

Author: Luis Duarte, Hugo Gonçalo Oliveira, and Luís Macedo
Subjects: Feature engineering, Exploit, Computer science, Emoji, business.industry, 05 social sciences, 02 engineering and technology, computer.software_genre, 050105 experimental psychology, Task (project management), Support vector machine, Naive Bayes classifier, ComputingMethodologies_PATTERNRECOGNITION, Similarity (psychology), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, 0501 psychology and cognitive sciences, Social media, Artificial intelligence, business, computer, Natural language processing
Abstract: New forms of communication, like emojis, are frequent today in social media. Having in mind their strong connection with expressed emotions, we exploit emojis towards the creation of a model for emotion recognition in Portuguese. We gather short texts from Twitter and follow a traditional text classification task, where emojis are used as labels. After the process of feature engineering, two types of Naive Bayes and SVM classifiers are trained: one for classifying emotion, based on related emojis; another for predicting emojis. Interesting but debatable results were obtained on the former task, while the latter revealed to be more challenging, mainly due to emoji similarity. Yet, this also suggests that we can rely on them as an alternative to manually labelling emotions.
Published: 2019
Full Text: View/download PDF

7. Recognizing Humor in Portuguese: First Steps

Author: Ana Alves, Hugo Gonçalo Oliveira, and André Clemêncio
Subjects: Computational model, Computer science, business.industry, Computational humor, 020206 networking & telecommunications, 02 engineering and technology, computer.software_genre, language.human_language, Domain (software engineering), Style (sociolinguistics), Verbal expression, 0202 electrical engineering, electronic engineering, information engineering, language, 020201 artificial intelligence & image processing, Artificial intelligence, Portuguese, business, Set (psychology), computer, Natural language processing
Abstract: Within the domain of Artificial Intelligence, humor has been a research topic for some time, but the automatic recognition of its verbal expression has never been tackled for Portuguese. This work aims to change this scenario. We describe a set of experiments towards the development of computational models that recognize humor written in Portuguese, based on content and humor-specific features extracted. Interesting results, with F1-scores up to 0.93, are achieved when classifiers for this purpose are trained and tested on texts with a similar style (question-answers or news headlines). Yet, when the testing examples are of a different style, results are poor, which suggests that much more has to be done towards effective humor recognition.
Published: 2019
Full Text: View/download PDF

8. Named Entity Recognition in Portuguese Neurology Text Using CRF

Author: Cesar Teixeira, Hugo Gonçalo Oliveira, and Fábio Lopes
Subjects: Conditional random field, medicine.medical_specialty, education.field_of_study, Neurology, Interpretation (logic), 020205 medical informatics, business.industry, Computer science, Population, 02 engineering and technology, computer.software_genre, language.human_language, Task (project management), Named-entity recognition, 0202 electrical engineering, electronic engineering, information engineering, medicine, language, 020201 artificial intelligence & image processing, Artificial intelligence, Health information, Portuguese, business, education, computer, Natural language processing
Abstract: Automatic recognition of named entities from clinical text lightens the work of health professionals by helping in the interpretation and easing tasks such as the population of databases with patient health information. In this study, we evaluated the performance of Conditional Random Fields, a sequence labelling model, for extracting entities from neurology clinical texts written in Portuguese. More than achieving F1-scores of about 73% or 80%, respectively for a relaxed or strict evaluation, the more discriminant features in this task were also analyzed.
Published: 2019
Full Text: View/download PDF

9. Learning Word Embeddings from Portuguese Lexical-Semantic Knowledge Bases

Author: Hugo Gonçalo Oliveira
Subjects: Lexical semantics, Computer science, business.industry, 02 engineering and technology, computer.software_genre, language.human_language, Knowledge base, Semantic similarity, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, language, Semantic memory, Graph (abstract data type), 020201 artificial intelligence & image processing, Artificial intelligence, Portuguese, business, computer, Natural language processing
Abstract: This paper describes the creation of PT-LKB, new Portuguese word embeddings learned from a large lexical-semantic knowledge base (LKB), using the node2vec method. Resulting embeddings combine the strengths of word vector representations and, even with lower dimensions, achieve high scores in genuine similarity, which so far were obtained by exploiting the graph structure of LKBs.
Published: 2018
Full Text: View/download PDF

10. Computational Processing of the Portuguese Language

Author: Carlos Ramisch, Hugo Gonçalo Oliveira, and Renata Ramisch
Published: 2018
Full Text: View/download PDF

11. Unsupervised Approaches for Computing Word Similarity in Portuguese

Author: Hugo Gonçalo Oliveira
Subjects: Lexical semantics, Computer science, business.industry, 02 engineering and technology, computer.software_genre, language.human_language, Task (project management), Single test, 03 medical and health sciences, 0302 clinical medicine, Semantic similarity, Similarity (psychology), 030221 ophthalmology & optometry, 0202 electrical engineering, electronic engineering, information engineering, language, 020201 artificial intelligence & image processing, Distributional semantics, Artificial intelligence, Portuguese, business, computer, Word (computer architecture), Natural language processing
Abstract: This paper presents several approaches for computing word similarity in Portuguese and is motivated by the recent availability of state-of-the-art distributional models of Portuguese words, which add to several lexical knowledge bases (LKBs) for this language, available for a longer time. The previous resources were exploited to answer word similarity tests, also recently available for Portuguese. We conclude that there are several valid approaches for this task, but not one that outperforms all the others in every single test. For instance, distributional models seem to capture relatedness better, but LKBs are better suited for computing genuine similarity.
Published: 2017
Full Text: View/download PDF

12. Gradually Improving the Computation of Semantic Textual Similarity in Portuguese

Author: Ricardo Rodrigues, Ana Alves, and Hugo Gonçalo Oliveira
Subjects: business.industry, Computer science, 02 engineering and technology, computer.software_genre, Machine learning, Security token, language.human_language, SemEval, Task (project management), Semantic similarity, Negation, 020204 information systems, Similarity (psychology), 0202 electrical engineering, electronic engineering, information engineering, language, 020201 artificial intelligence & image processing, Artificial intelligence, Portuguese, Unavailability, business, computer, Natural language processing
Abstract: There is much research on Semantic Textual Similarity (STS) in English, specially since its inclusion in the SemEval evaluations. For other languages, it is not as common, mostly due to the unavailability of benchmarks. Recently, the ASSIN shared task targeted STS in Portuguese and released training and test collections. This paper describes an incremental approach to ASSIN, where the computed similarity is gradually improved by exploiting different features (e.g., token overlap, semantic relations, chunks, and negation) and approaches. The best reported results, obtained with a supervised approach, would get second place overall in ASSIN.
Published: 2017
Full Text: View/download PDF

13. Automatic Generation of Internet Memes from Portuguese News Headlines

Author: Diogo Costa, Alexandre Miguel Pinto, and Hugo Gonçalo Oliveira
Subjects: Internet meme, Multimedia, Computer science, Computational humor, 06 humanities and the arts, 02 engineering and technology, 0603 philosophy, ethics and religion, computer.software_genre, language.human_language, World Wide Web, 060302 philosophy, 0202 electrical engineering, electronic engineering, information engineering, language, 020201 artificial intelligence & image processing, Portuguese, computer
Abstract: This paper presents MemeGera, a prototype tool that generates image-based memes from Portuguese news headlines. All is done automatically, with the help of computational linguistic resources, uncovered here with the rules for selecting images and adapting the text.
Published: 2016
Full Text: View/download PDF

14. Automatic Generation of Poetry Inspired by Twitter Trends

Author: Hugo Gonçalo Oliveira
Subjects: Computational creativity, Multimedia, Poetry, Social network, Grammar, Computer science, business.industry, media_common.quotation_subject, Semantic domain, computer.software_genre, Semantic network, Social media, Artificial intelligence, business, Set (psychology), computer, Natural language processing, media_common
Abstract: This paper revisits PoeTryMe, a poetry generation platform, and presents its most recent instantiation for producing poetry inspired by trends in the Twitter social network. The presented system searches for tweets that mention a given topic, extracts the most frequent words in those tweets, and uses them as seeds for the generation of new poems. The set of seeds might still be expanded with semantically-relevant words. Generation is performed by the classic PoeTryMe system, based on a semantic network and a grammar, with a previously used generate&test strategy. Illustrative results are presented using different seed expansion settings. They show that the produced poems use semantically-coherent lines with words that, at the time of generation, were associated with the topic. Resulting poems are not really about the topic, but they are a way of expressing, poetically, what the system knows about the semantic domain set by the topic.
Published: 2016
Full Text: View/download PDF

15. CONTO.PT: Groundwork for the Automatic Creation of a Fuzzy Portuguese Wordnet

Author: Hugo Gonçalo Oliveira
Subjects: Measure (data warehouse), Computer science, business.industry, Redundancy (linguistics), media_common.quotation_subject, WordNet, 02 engineering and technology, computer.software_genre, Fuzzy logic, language.human_language, Resource (project management), 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, language, 020201 artificial intelligence & image processing, Quality (business), Artificial intelligence, Portuguese, business, computer, Natural language processing, media_common
Abstract: There are several lexical resources available for the computational processing of Portuguese, organised differently and created by different people with different approaches and limitations. This paper presents the first experiments towards the exploitation of seven of those resources in the automatic creation of a large wordnet, where numerical scores are assigned to the inclusion of words in synsets and to the connection of synsets by semantic relations. Experiments confirm that a large wordnet can indeed be created and, to some extent, computed scores can be used as a confidence measure, which will enable the users to select only a portion of the resource, depending on the needs of their application on quantity and quality of lexical-semantic knowledge.
Published: 2016
Full Text: View/download PDF

16. Towards the Improvement of a Topic Model with Semantic Knowledge

Author: Adriana Ferrugento, Hugo Gonçalo Oliveira, Filipe Rodrigues, and Ana Alves
Subjects: Topic model, Information retrieval, business.industry, Redundancy (linguistics), Computer science, WordNet, computer.software_genre, General semantics, Knowledge base, Semantic memory, Artificial intelligence, business, computer, Natural language processing, Generative grammar
Abstract: Although typically used in classic topic models, surface words cannot represent meaning on their own. Consequently, redundancy is common in those topics, which may, for instance, include synonyms. To face this problem, we present SemLDA, an extended topic model that incorporates semantics from an external lexical-semantic knowledge base. SemLDA is introduced and explained in detail, pointing out where semantics is included both in the pre-pocessing and generative phase of topic distributions. As a result, instead of topics as distributions over words, we obtain distributions over concepts, each represented by a set of synonymous words. In order to evaluate SemLDA, we applied preliminary qualitative tests automatically against a state-of-the-art classical topic model. The results were promising and confirm our intuition towards the benefits of incorporating general semantics in a topic model.
Published: 2015
Full Text: View/download PDF

17. On the Utility of Portuguese Term-Based Lexical-Semantic Networks

Author: Hugo Gonçalo Oliveira
Subjects: Focus (computing), Resource (project management), Computer science, language, Portuguese, Data science, language.human_language, Semantic network, Term (time)
Abstract: This paper discusses the utility of term-based lexical networks, with focus on Portuguese. Although less complex and often undervalued towards wordnets, for Portuguese, these resources have been used in varied natural language processing tasks. We enumerate those, and then introduce a larger resource of this kind together with two additional tasks where it was useful.
Published: 2014
Full Text: View/download PDF

18. The Creation of Onto.PT: A Wordnet-Like Lexical Ontology for Portuguese

Author: Hugo Gonçalo Oliveira
Subjects: Relation (database), Computer science, business.industry, WordNet, Context (language use), Ontology (information science), computer.software_genre, Relationship extraction, Lexical item, Upper ontology, Artificial intelligence, Computational linguistics, business, computer, Natural language processing
Abstract: A wordnet is an important tool for developing natural language processing applications for a language, but the manual creation of such a resource limits its development. This dissertation studied the automatic construction of Onto.PT, a large Portuguese wordnet, aiming to minimise the main limitations of existing Portuguese wordnets. On this context, we propose ECO, an approach for creating wordnets automatically from text – relation instances are extracted, synonymy clusters (synsets) are discovered, and the remaining relations are then attached to suitable synsets. This document also reports on the contents of Onto.PT, its comparison to other wordnets, and its evaluation.
Published: 2014
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

18 results on '"Hugo Gonçalo Oliveira"'

1. Drilling Lexico-Semantic Knowledge in Portuguese from BERT

2. Answering Fill-in-the-Blank Questions in Portuguese with Transformer Language Models

3. Exploring Portuguese Word Embeddings for Discovering Lexical-Semantic Relations

4. Leveraging on Semantic Textual Similarity for Developing a Portuguese Dialogue System

5. The ASSIN 2 Shared Task: A Quick Overview

6. Exploring Emojis for Emotion Recognition in Portuguese Text

7. Recognizing Humor in Portuguese: First Steps

8. Named Entity Recognition in Portuguese Neurology Text Using CRF

9. Learning Word Embeddings from Portuguese Lexical-Semantic Knowledge Bases

10. Computational Processing of the Portuguese Language

11. Unsupervised Approaches for Computing Word Similarity in Portuguese

12. Gradually Improving the Computation of Semantic Textual Similarity in Portuguese

13. Automatic Generation of Internet Memes from Portuguese News Headlines

14. Automatic Generation of Poetry Inspired by Twitter Trends

15. CONTO.PT: Groundwork for the Automatic Creation of a Fuzzy Portuguese Wordnet

16. Towards the Improvement of a Topic Model with Semantic Knowledge

17. On the Utility of Portuguese Term-Based Lexical-Semantic Networks

18. The Creation of Onto.PT: A Wordnet-Like Lexical Ontology for Portuguese

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Database

18 results on '"Hugo Gonçalo Oliveira"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources