18 results on '"Hugo Gonçalo Oliveira"'
Search Results
2. Answering Fill-in-the-Blank Questions in Portuguese with Transformer Language Models
- Author
-
Hugo Gonçalo Oliveira
- Subjects
Sequence ,business.industry ,Semantics (computer science) ,Computer science ,computer.software_genre ,Blank ,language.human_language ,language ,Language modelling ,Language model ,Artificial intelligence ,Portuguese ,business ,computer ,Natural language processing ,Sentence ,Transformer (machine learning model) - Abstract
Despite different applications, transformer-based language models, like BERT and GPT, learn about language by predicting missing parts of text. BERT is pretrained in Masked Language Modelling and GPT generates text from a given sequence. We explore such models for answering cloze questions in Portuguese, following different approaches. When options are not considered, the largest BERT model, trained exclusively for Portuguese, is the most accurate. But when selecting the best option, top performance is achieved by computing the most probable sentence, and GPT-2 fine-tuned for Portuguese beats BERT.
- Published
- 2021
- Full Text
- View/download PDF
3. Exploring Portuguese Word Embeddings for Discovering Lexical-Semantic Relations
- Author
-
Ana Alves, Hugo Gonçalo Oliveira, and Tiago Sousa
- Subjects
business.industry ,Computer science ,020206 networking & telecommunications ,02 engineering and technology ,computer.software_genre ,language.human_language ,0202 electrical engineering, electronic engineering, information engineering ,language ,020201 artificial intelligence & image processing ,Artificial intelligence ,Portuguese ,business ,computer ,Natural language processing ,Word (computer architecture) - Abstract
Word2vec-like word embeddings are known for keeping linguistic regularities and thus good for solving analogies. Following this, we explore such embeddings for Portuguese in the discovery of lexical-semantic relations, which can be used for augmenting lexical-semantic knowledge bases. In this exploratory approach, we tested different methods for discovering relations of different types and confirm that word embeddings can be used, at least, for suggesting new candidate relations.
- Published
- 2020
- Full Text
- View/download PDF
4. Leveraging on Semantic Textual Similarity for Developing a Portuguese Dialogue System
- Author
-
Ana Alves, José D. Santos, and Hugo Gonçalo Oliveira
- Subjects
Information retrieval ,Computer science ,02 engineering and technology ,language.human_language ,Set (abstract data type) ,Semantic similarity ,Order (business) ,020204 information systems ,Similarity (psychology) ,0202 electrical engineering, electronic engineering, information engineering ,Question answering ,language ,020201 artificial intelligence & image processing ,Portuguese - Abstract
We describe an IR-based dialogue system that, in order to match user interactions with FAQs on a list, leverages on a model for computing the semantic similarity between two fragments of Portuguese text. It was mainly used for answering questions about the economic activity in Portugal and, when no FAQ has a higher score than a threshold, it may search for similar interactions in a corpus of movie subtitles and still tries to give a suitable response. Besides describing the underlying model and its integration, we assess it when answering variations of FAQs and report on an experiment to set the aforementioned threshold.
- Published
- 2020
- Full Text
- View/download PDF
5. The ASSIN 2 Shared Task: A Quick Overview
- Author
-
Livy Real, Hugo Gonçalo Oliveira, and Erick Rocha Fonseca
- Subjects
business.industry ,Computer science ,02 engineering and technology ,computer.software_genre ,language.human_language ,Task (project management) ,Natural language inference ,020204 information systems ,Similarity (psychology) ,0202 electrical engineering, electronic engineering, information engineering ,language ,020201 artificial intelligence & image processing ,Artificial intelligence ,Portuguese ,Textual entailment ,business ,computer ,Natural language processing - Abstract
This paper offers a brief overview on the ASSIN 2, an evaluation shared task collocated with STIL 2019. ASSIN 2 covered two different but related tasks: Recognizing Textual Entailment (RTE), also known as Natural Language Inference (NLI), and Semantic Textual Similarity (STS). The ASSIN 2 collection was made of pairs of sentences annotated with human judgments for NLI and STS. Participating teams could take part in any of the tasks or both: nine teams participated in the STS task and eight in the NLI task.
- Published
- 2020
- Full Text
- View/download PDF
6. Exploring Emojis for Emotion Recognition in Portuguese Text
- Author
-
Luis Duarte, Hugo Gonçalo Oliveira, and Luís Macedo
- Subjects
Feature engineering ,Exploit ,Computer science ,Emoji ,business.industry ,05 social sciences ,02 engineering and technology ,computer.software_genre ,050105 experimental psychology ,Task (project management) ,Support vector machine ,Naive Bayes classifier ,ComputingMethodologies_PATTERNRECOGNITION ,Similarity (psychology) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,0501 psychology and cognitive sciences ,Social media ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
New forms of communication, like emojis, are frequent today in social media. Having in mind their strong connection with expressed emotions, we exploit emojis towards the creation of a model for emotion recognition in Portuguese. We gather short texts from Twitter and follow a traditional text classification task, where emojis are used as labels. After the process of feature engineering, two types of Naive Bayes and SVM classifiers are trained: one for classifying emotion, based on related emojis; another for predicting emojis. Interesting but debatable results were obtained on the former task, while the latter revealed to be more challenging, mainly due to emoji similarity. Yet, this also suggests that we can rely on them as an alternative to manually labelling emotions.
- Published
- 2019
- Full Text
- View/download PDF
7. Recognizing Humor in Portuguese: First Steps
- Author
-
Ana Alves, Hugo Gonçalo Oliveira, and André Clemêncio
- Subjects
Computational model ,Computer science ,business.industry ,Computational humor ,020206 networking & telecommunications ,02 engineering and technology ,computer.software_genre ,language.human_language ,Domain (software engineering) ,Style (sociolinguistics) ,Verbal expression ,0202 electrical engineering, electronic engineering, information engineering ,language ,020201 artificial intelligence & image processing ,Artificial intelligence ,Portuguese ,business ,Set (psychology) ,computer ,Natural language processing - Abstract
Within the domain of Artificial Intelligence, humor has been a research topic for some time, but the automatic recognition of its verbal expression has never been tackled for Portuguese. This work aims to change this scenario. We describe a set of experiments towards the development of computational models that recognize humor written in Portuguese, based on content and humor-specific features extracted. Interesting results, with F1-scores up to 0.93, are achieved when classifiers for this purpose are trained and tested on texts with a similar style (question-answers or news headlines). Yet, when the testing examples are of a different style, results are poor, which suggests that much more has to be done towards effective humor recognition.
- Published
- 2019
- Full Text
- View/download PDF
8. Named Entity Recognition in Portuguese Neurology Text Using CRF
- Author
-
Cesar Teixeira, Hugo Gonçalo Oliveira, and Fábio Lopes
- Subjects
Conditional random field ,medicine.medical_specialty ,education.field_of_study ,Neurology ,Interpretation (logic) ,020205 medical informatics ,business.industry ,Computer science ,Population ,02 engineering and technology ,computer.software_genre ,language.human_language ,Task (project management) ,Named-entity recognition ,0202 electrical engineering, electronic engineering, information engineering ,medicine ,language ,020201 artificial intelligence & image processing ,Artificial intelligence ,Health information ,Portuguese ,business ,education ,computer ,Natural language processing - Abstract
Automatic recognition of named entities from clinical text lightens the work of health professionals by helping in the interpretation and easing tasks such as the population of databases with patient health information. In this study, we evaluated the performance of Conditional Random Fields, a sequence labelling model, for extracting entities from neurology clinical texts written in Portuguese. More than achieving F1-scores of about 73% or 80%, respectively for a relaxed or strict evaluation, the more discriminant features in this task were also analyzed.
- Published
- 2019
- Full Text
- View/download PDF
9. Learning Word Embeddings from Portuguese Lexical-Semantic Knowledge Bases
- Author
-
Hugo Gonçalo Oliveira
- Subjects
Lexical semantics ,Computer science ,business.industry ,02 engineering and technology ,computer.software_genre ,language.human_language ,Knowledge base ,Semantic similarity ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,language ,Semantic memory ,Graph (abstract data type) ,020201 artificial intelligence & image processing ,Artificial intelligence ,Portuguese ,business ,computer ,Natural language processing - Abstract
This paper describes the creation of PT-LKB, new Portuguese word embeddings learned from a large lexical-semantic knowledge base (LKB), using the node2vec method. Resulting embeddings combine the strengths of word vector representations and, even with lower dimensions, achieve high scores in genuine similarity, which so far were obtained by exploiting the graph structure of LKBs.
- Published
- 2018
- Full Text
- View/download PDF
10. Computational Processing of the Portuguese Language
- Author
-
Carlos Ramisch, Hugo Gonçalo Oliveira, and Renata Ramisch
- Published
- 2018
- Full Text
- View/download PDF
11. Unsupervised Approaches for Computing Word Similarity in Portuguese
- Author
-
Hugo Gonçalo Oliveira
- Subjects
Lexical semantics ,Computer science ,business.industry ,02 engineering and technology ,computer.software_genre ,language.human_language ,Task (project management) ,Single test ,03 medical and health sciences ,0302 clinical medicine ,Semantic similarity ,Similarity (psychology) ,030221 ophthalmology & optometry ,0202 electrical engineering, electronic engineering, information engineering ,language ,020201 artificial intelligence & image processing ,Distributional semantics ,Artificial intelligence ,Portuguese ,business ,computer ,Word (computer architecture) ,Natural language processing - Abstract
This paper presents several approaches for computing word similarity in Portuguese and is motivated by the recent availability of state-of-the-art distributional models of Portuguese words, which add to several lexical knowledge bases (LKBs) for this language, available for a longer time. The previous resources were exploited to answer word similarity tests, also recently available for Portuguese. We conclude that there are several valid approaches for this task, but not one that outperforms all the others in every single test. For instance, distributional models seem to capture relatedness better, but LKBs are better suited for computing genuine similarity.
- Published
- 2017
- Full Text
- View/download PDF
12. Gradually Improving the Computation of Semantic Textual Similarity in Portuguese
- Author
-
Ricardo Rodrigues, Ana Alves, and Hugo Gonçalo Oliveira
- Subjects
business.industry ,Computer science ,02 engineering and technology ,computer.software_genre ,Machine learning ,Security token ,language.human_language ,SemEval ,Task (project management) ,Semantic similarity ,Negation ,020204 information systems ,Similarity (psychology) ,0202 electrical engineering, electronic engineering, information engineering ,language ,020201 artificial intelligence & image processing ,Artificial intelligence ,Portuguese ,Unavailability ,business ,computer ,Natural language processing - Abstract
There is much research on Semantic Textual Similarity (STS) in English, specially since its inclusion in the SemEval evaluations. For other languages, it is not as common, mostly due to the unavailability of benchmarks. Recently, the ASSIN shared task targeted STS in Portuguese and released training and test collections. This paper describes an incremental approach to ASSIN, where the computed similarity is gradually improved by exploiting different features (e.g., token overlap, semantic relations, chunks, and negation) and approaches. The best reported results, obtained with a supervised approach, would get second place overall in ASSIN.
- Published
- 2017
- Full Text
- View/download PDF
13. Automatic Generation of Internet Memes from Portuguese News Headlines
- Author
-
Diogo Costa, Alexandre Miguel Pinto, and Hugo Gonçalo Oliveira
- Subjects
Internet meme ,Multimedia ,Computer science ,Computational humor ,06 humanities and the arts ,02 engineering and technology ,0603 philosophy, ethics and religion ,computer.software_genre ,language.human_language ,World Wide Web ,060302 philosophy ,0202 electrical engineering, electronic engineering, information engineering ,language ,020201 artificial intelligence & image processing ,Portuguese ,computer - Abstract
This paper presents MemeGera, a prototype tool that generates image-based memes from Portuguese news headlines. All is done automatically, with the help of computational linguistic resources, uncovered here with the rules for selecting images and adapting the text.
- Published
- 2016
- Full Text
- View/download PDF
14. Automatic Generation of Poetry Inspired by Twitter Trends
- Author
-
Hugo Gonçalo Oliveira
- Subjects
Computational creativity ,Multimedia ,Poetry ,Social network ,Grammar ,Computer science ,business.industry ,media_common.quotation_subject ,Semantic domain ,computer.software_genre ,Semantic network ,Social media ,Artificial intelligence ,business ,Set (psychology) ,computer ,Natural language processing ,media_common - Abstract
This paper revisits PoeTryMe, a poetry generation platform, and presents its most recent instantiation for producing poetry inspired by trends in the Twitter social network. The presented system searches for tweets that mention a given topic, extracts the most frequent words in those tweets, and uses them as seeds for the generation of new poems. The set of seeds might still be expanded with semantically-relevant words. Generation is performed by the classic PoeTryMe system, based on a semantic network and a grammar, with a previously used generate&test strategy. Illustrative results are presented using different seed expansion settings. They show that the produced poems use semantically-coherent lines with words that, at the time of generation, were associated with the topic. Resulting poems are not really about the topic, but they are a way of expressing, poetically, what the system knows about the semantic domain set by the topic.
- Published
- 2016
- Full Text
- View/download PDF
15. CONTO.PT: Groundwork for the Automatic Creation of a Fuzzy Portuguese Wordnet
- Author
-
Hugo Gonçalo Oliveira
- Subjects
Measure (data warehouse) ,Computer science ,business.industry ,Redundancy (linguistics) ,media_common.quotation_subject ,WordNet ,02 engineering and technology ,computer.software_genre ,Fuzzy logic ,language.human_language ,Resource (project management) ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,language ,020201 artificial intelligence & image processing ,Quality (business) ,Artificial intelligence ,Portuguese ,business ,computer ,Natural language processing ,media_common - Abstract
There are several lexical resources available for the computational processing of Portuguese, organised differently and created by different people with different approaches and limitations. This paper presents the first experiments towards the exploitation of seven of those resources in the automatic creation of a large wordnet, where numerical scores are assigned to the inclusion of words in synsets and to the connection of synsets by semantic relations. Experiments confirm that a large wordnet can indeed be created and, to some extent, computed scores can be used as a confidence measure, which will enable the users to select only a portion of the resource, depending on the needs of their application on quantity and quality of lexical-semantic knowledge.
- Published
- 2016
- Full Text
- View/download PDF
16. Towards the Improvement of a Topic Model with Semantic Knowledge
- Author
-
Adriana Ferrugento, Hugo Gonçalo Oliveira, Filipe Rodrigues, and Ana Alves
- Subjects
Topic model ,Information retrieval ,business.industry ,Redundancy (linguistics) ,Computer science ,WordNet ,computer.software_genre ,General semantics ,Knowledge base ,Semantic memory ,Artificial intelligence ,business ,computer ,Natural language processing ,Generative grammar - Abstract
Although typically used in classic topic models, surface words cannot represent meaning on their own. Consequently, redundancy is common in those topics, which may, for instance, include synonyms. To face this problem, we present SemLDA, an extended topic model that incorporates semantics from an external lexical-semantic knowledge base. SemLDA is introduced and explained in detail, pointing out where semantics is included both in the pre-pocessing and generative phase of topic distributions. As a result, instead of topics as distributions over words, we obtain distributions over concepts, each represented by a set of synonymous words. In order to evaluate SemLDA, we applied preliminary qualitative tests automatically against a state-of-the-art classical topic model. The results were promising and confirm our intuition towards the benefits of incorporating general semantics in a topic model.
- Published
- 2015
- Full Text
- View/download PDF
17. On the Utility of Portuguese Term-Based Lexical-Semantic Networks
- Author
-
Hugo Gonçalo Oliveira
- Subjects
Focus (computing) ,Resource (project management) ,Computer science ,language ,Portuguese ,Data science ,language.human_language ,Semantic network ,Term (time) - Abstract
This paper discusses the utility of term-based lexical networks, with focus on Portuguese. Although less complex and often undervalued towards wordnets, for Portuguese, these resources have been used in varied natural language processing tasks. We enumerate those, and then introduce a larger resource of this kind together with two additional tasks where it was useful.
- Published
- 2014
- Full Text
- View/download PDF
18. The Creation of Onto.PT: A Wordnet-Like Lexical Ontology for Portuguese
- Author
-
Hugo Gonçalo Oliveira
- Subjects
Relation (database) ,Computer science ,business.industry ,WordNet ,Context (language use) ,Ontology (information science) ,computer.software_genre ,Relationship extraction ,Lexical item ,Upper ontology ,Artificial intelligence ,Computational linguistics ,business ,computer ,Natural language processing - Abstract
A wordnet is an important tool for developing natural language processing applications for a language, but the manual creation of such a resource limits its development. This dissertation studied the automatic construction of Onto.PT, a large Portuguese wordnet, aiming to minimise the main limitations of existing Portuguese wordnets. On this context, we propose ECO, an approach for creating wordnets automatically from text – relation instances are extracted, synonymy clusters (synsets) are discovered, and the remaining relations are then attached to suitable synsets. This document also reports on the contents of Onto.PT, its comparison to other wordnets, and its evaluation.
- Published
- 2014
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.