Author: "Gutiérrez, Yoan" / Search Limiters: Full Text - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Gutiérrez, Yoan"' showing total 212 results

Start Over Author "Gutiérrez, Yoan" Search Limiters Full Text

212 results on '"Gutiérrez, Yoan"'

1. Leveraging Large Language Models to Measure Gender Representation Bias in Gendered Language Corpora

Author: Derner, Erik, de la Fuente, Sara Sansalvador, Gutiérrez, Yoan, Moreda, Paloma, and Oliver, Nuria
Subjects: Computer Science - Computation and Language, Computer Science - Computers and Society
Abstract: Gender bias in text corpora that are used for a variety of natural language processing (NLP) tasks, such as for training large language models (LLMs), can lead to the perpetuation and amplification of societal inequalities. This phenomenon is particularly pronounced in gendered languages like Spanish or French, where grammatical structures inherently encode gender, making the bias analysis more challenging. A first step in quantifying gender bias in text entails computing biases in gender representation, i.e., differences in the prevalence of words referring to males vs. females. Existing methods to measure gender representation bias in text corpora have mainly been proposed for English and do not generalize to gendered languages due to the intrinsic linguistic differences between English and gendered languages. This paper introduces a novel methodology that leverages the contextual understanding capabilities of LLMs to quantitatively measure gender representation bias in Spanish corpora. By utilizing LLMs to identify and classify gendered nouns and pronouns in relation to their reference to human entities, our approach provides a robust analysis of gender representation bias in gendered languages. We empirically validate our method on four widely-used benchmark datasets, uncovering significant gender prevalence disparities with a male-to-female ratio ranging from 4:1 to 6:1. These findings demonstrate the value of our methodology for bias quantification in gendered language corpora and suggest its application in NLP, contributing to the development of more equitable language technologies.
Published: 2024

2. A comprehensive methodology to construct standardised datasets for Science and Technology Parks

Author: Francés, Olga, Fernández, Javi, Abreu-Salas, José, Gutiérrez, Yoan, and Palomar, Manuel
Published: 2024
Full Text: View/download PDF

3. KD SENSO-MERGER: An architecture for semantic integration of heterogeneous data

Author: Gutiérrez, Yoan, Salas, José I. Abreu, Montoyo, Andrés, Muñoz, Rafael, and Estévez-Velarde, Suilan
Published: 2024
Full Text: View/download PDF

4. Automatic annotation of protected attributes to support fairness optimization

Author: Consuegra-Ayala, Juan Pablo, Gutiérrez, Yoan, Almeida-Cruz, Yudivian, and Palomar, Manuel
Published: 2024
Full Text: View/download PDF

5. Intelligent ensembling of auto-ML system outputs for solving classification problems

Author: Consuegra-Ayala, Juan Pablo, Gutiérrez, Yoan, Almeida-Cruz, Yudivian, and Palomar, Manuel
Published: 2022
Full Text: View/download PDF

6. Why are some social-media contents more popular than others? Opinion and association rules mining applied to virality patterns discovery

Author: Saquete, Estela, Zubcoff, Jose, Gutiérrez, Yoan, Martínez-Barco, Patricio, and Fernández, Javi
Published: 2022
Full Text: View/download PDF

7. Bias mitigation for fair automation of classification tasks.

Author: Consuegra‐Ayala, Juan Pablo, Gutiérrez, Yoan, Almeida‐Cruz, Yudivian, and Palomar, Manuel
Subjects: *MACHINE learning, *AUTOMATIC classification, *SOURCE code, *SCIENTIFIC community, *FAIRNESS
Abstract: The incorporation of machine learning algorithms into high‐risk decision‐making tasks has raised some alarms in the scientific community. Research shows that machine learning‐based technologies can contain biases that cause unfair decisions for certain population groups. The fundamental danger of ignoring this problem is that machine learning methods can not only reflect the biases present in our society but could also amplify them. This article presents the design and validation of a technology to assist the fair automation of classification problems. In essence, the proposal is based on taking advantage of the intermediate solutions generated during the resolution of classification problems through using Auto‐ML tools, in particular, AutoGOAL, to create unbiased/fair classifiers. The technology employs a multi‐objective optimization search to find the collection of models with the best trade‐offs between performance and fairness. To solve the optimization problem, we introduce a combination of Probabilistic Grammatical Evolution Search and NSGA‐II. The technology was evaluated using the Adult dataset from the UCI repository, a common benchmark in related research. Results were compared with other published results in scenarios with single and multiple fairness definitions. Our experiments demonstrate the technology's ability to automate classification tasks while incorporating fairness constraints. Additionally, our method achieves competitive results against other bias mitigation techniques. A notable advantage of our approach is its minimal requirement for machine learning expertise, thanks to its Auto‐ML foundation. This makes the technology accessible and valuable for advancing fairness in machine learning applications. The source code is available online for the research community. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

8. Automatic extension of corpora from the intelligent ensembling of eHealth knowledge discovery systems outputs

Author: Consuegra-Ayala, Juan Pablo, Gutiérrez, Yoan, Piad-Morffis, Alejandro, Almeida-Cruz, Yudivian, and Palomar, Manuel
Published: 2021
Full Text: View/download PDF

9. General-purpose hierarchical optimisation of machine learning pipelines with grammatical evolution

Author: Estevez-Velarde, Suilan, Gutiérrez, Yoan, Almeida-Cruz, Yudivián, and Montoyo, Andrés
Published: 2021
Full Text: View/download PDF

10. KD SENSO-MERGER: An architecture for semantic integration of heterogeneous data

Author: Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Gutiérrez, Yoan, Abreu Salas, José Ignacio, Montoyo, Andres, Muñoz, Rafael, Estévez-Velarde, Suilan, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Gutiérrez, Yoan, Abreu Salas, José Ignacio, Montoyo, Andres, Muñoz, Rafael, and Estévez-Velarde, Suilan
Abstract: This paper presents KD SENSO-MERGER, a novel Knowledge Discovery (KD) architecture that is capable of semantically integrating heterogeneous data from various sources of structured and unstructured data (i.e. geolocations, demographic, socio-economic, user reviews, and comments). This goal drives the main design approach of the architecture. It works by building internal representations that adapt and merge knowledge across multiple domains, ensuring that the knowledge base is continuously updated. To deal with the challenge of integrating heterogeneous data, this proposal puts forward the corresponding solutions: (i) knowledge extraction, addressed via a plugin-based architecture of knowledge sensors; (ii) data integrity, tackled by an architecture designed to deal with uncertain or noisy information; (iii) scalability, this is also supported by the plugin-based architecture as only relevant knowledge to the scenario is integrated by switching-off non-relevant sensors. Also, we minimize the expert knowledge required, which may pose a bottleneck when integrating a fast-paced stream of new sources. As proof of concept, we developed a case study that deploys the architecture to integrate population census and economic data, municipal cartography, and Google Reviews to analyze the socio-economic contexts of educational institutions. The knowledge discovered enables us to answer questions that are not possible through individual sources. Thus, companies or public entities can discover patterns of behavior or relationships that would otherwise not be visible and this would allow extracting valuable information for the decision-making process.
Published: 2024

11. The risky news sharing quotient (RNSQ): A research instrument for exploring news-sharing behaviour that spreads fake news

Author: Universidad de Alicante. Departamento de Filología Inglesa, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante. Instituto Universitario de Investigación Informática, Martin, Tania Josephine, Gutiérrez, Yoan, Sepúlveda-Torres, Robiert, Abreu Salas, José Ignacio, Universidad de Alicante. Departamento de Filología Inglesa, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante. Instituto Universitario de Investigación Informática, Martin, Tania Josephine, Gutiérrez, Yoan, Sepúlveda-Torres, Robiert, and Abreu Salas, José Ignacio
Abstract: The spread of fake news (FN) has attracted attention from disciplines ranging from social sciences to Artificial Intelligence. This work is novel because it explores the news-sharing behaviour of social-media users, focussing on those that spread FN, rather than the psychological motivations behind them. The 14-item Risky News-Sharing Quotient (RNSQ) was developed and Exploratory Factor Analysis discovered three relevant factors: (i) news-sharing behaviour that contributes to debunking FN; (ii) news-sharing frequency and attitudes to sharing; and (iii) news-sharing behaviour that contributes to the spread of FN. The study, conducted among university students, found that 75% reported risky news-sharing behaviour that spreads FN. No link was found between perceiving FN as a problem and debunking it. Moreover, 83% of survey participants were unable to identify a FN story. Overall, the findings suggest an inability to apply knowledge of the relevant FN detection strategies to debunk FN, but importantly an apparent lack of motivation to check the veracity of a news story. From these conclusions, better-informed educational intervention strategies can be implemented to address the FN problem in-situ, such as promoting the importance of responsible news-sharing by raising awareness of how the spread of FN can impede the proper functioning of societies.
Published: 2024

12. Automatic annotation of protected attributes to support fairness optimization

Author: Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante. Instituto Universitario de Investigación Informática, Consuegra-Ayala, Juan Pablo, Gutiérrez, Yoan, Almeida-Cruz, Yudivian, Palomar, Manuel, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante. Instituto Universitario de Investigación Informática, Consuegra-Ayala, Juan Pablo, Gutiérrez, Yoan, Almeida-Cruz, Yudivian, and Palomar, Manuel
Abstract: Recent research has shown that the unaware automation of high-risk decision-making tasks can result in unfair decisions being made. The most common approaches to address this problem adopt definitions of fairness based on protected attributes. Precise annotation of protected attributes enables the application of bias mitigation techniques to commonly unlabeled kinds of data (e.g., images, text, etc.). This paper proposes a framework to automatically annotate protected attributes in data collections. The framework focuses on providing a single interface to annotate protected attributes of different types (e.g., gender, race, etc.) and from different kinds of data. Internally, the framework coordinates multiple sensors to produce the final annotation. Several sensors for textual data are proposed. An optimization search technique is designed to tune the framework to specific domains. Additionally, a small dataset of movie reviews —annotated with gender and sentiment— was created. The evaluation in datasets of texts from diverse domains shows the quality of the annotations and their effectiveness to be used as a proxy to estimate fairness in datasets and machine learning models. The source code is available online for the research community.
Published: 2024

13. Geo.IA: Artificial Geo-Intelligence Platform to Solve Citizens Problems and Facilitate Strategic Decision Making in the Public Administration

Author: Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Montoyo, Andres, Muñoz, Rafael, Gutiérrez, Yoan, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Montoyo, Andres, Muñoz, Rafael, and Gutiérrez, Yoan
Abstract: The objective of Geo-IA is to research, design and implement a Geo-Smart Artificial Intelligence technology platform for public and private business organizations. The GeoIA project presents a geolocation platform that integrates technological innovation to support a strategy for the creation of a Smart Territories. To do this, Text Mining, Machine Learning (including deep learning) and Natural Language Processing technologies are deployed. The functionality of the geolocation platform is to analyze, integrate, share data, visualize and represent territorial indicators, with the aim of facilitating the monitoring and fulfillment of territorial strategies. In short, GeoIA promotes interoperability between public administration bodies and also provides citizens with mechanisms to access information of interest, where the magnitude of the integrated and interrelated data permits. GeoIA also provides digital knowledge (tools, linked information, semantics, virtual assistants) for use by public administrations to enhance their decision making through greater knowledge of the environment and to improve services to citizens.
Published: 2024

14. Mitigación de Sesgos para la Automatización Justa de Tareas de Clasificación

Author: Gutiérrez, Yoan, Almeida-Cruz, Yudivian, Palomar, Manuel, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Consuegra-Ayala, Juan Pablo, Gutiérrez, Yoan, Almeida-Cruz, Yudivian, Palomar, Manuel, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, and Consuegra-Ayala, Juan Pablo
Abstract: Los modelos de aprendizaje automático están siendo ampliamente utilizados en múltiples áreas de la vida humana. Tradicionalmente, se han aplicado en reconocimiento de voz, detección de rostros, clasificación de imágenes, sistemas de recomendación, etc. Con la reciente revolución de los modelos generativos, la popularidad de los chatbots conversacionales se ha disparado. Esto ha dado lugar a que los modelos de aprendizaje automático se utilicen cada vez más para abordar tareas para las que no estaban específicamente capacitados. El prompt engineering ha permitido que personas no expertas en aprendizaje automático (que comúnmente tampoco están familiarizadas con los problemas subyacentes al uso de modelos de aprendizaje automático para hacer predicciones) automaticen ciertas tareas. La incorporación de algoritmos de aprendizaje automático en tareas de toma de decisiones de alto riesgo ha levantado algunas alertas en la comunidad científica. Las tareas de toma de decisiones de alto riesgo denotan aquellas tareas que pueden tener un gran impacto en las vidas de las personas sobre quienes se toman las decisiones. Por ejemplo, se han utilizado modelos para decidir si una persona es contratada o no, si se le concede un préstamo, si se acepta una solicitud de cobertura ampliada de seguridad sanitaria y para predecir la probabilidad de reincidencia en un delito. Estudios han demostrado que la automatización inconsciente de este tipo de tareas contiene sesgos, lo cual provoca que decisiones injustas sean tomadas sobre determinados grupos de población. El peligro fundamental de ignorar este problema es que los métodos de aprendizaje automático podrían no sólo reflejar los sesgos presentes en nuestra sociedad, sino que también amplificarlos. Esta tesis presenta el diseño y validación de una tecnología para asistir la automatización justa de problemas de clasificación. En esencia, la propuesta se basa en diseñar una tecnología que saque provecho de las soluciones intermedias gen
Published: 2024

15. A comprehensive methodology to construct standardised datasets for Science and Technology Parks

Author: Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Francés Hernández, Olga, Fernández Martínez, Javier, Abreu Salas, José Ignacio, Gutiérrez, Yoan, Palomar, Manuel, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Francés Hernández, Olga, Fernández Martínez, Javier, Abreu Salas, José Ignacio, Gutiérrez, Yoan, and Palomar, Manuel
Abstract: This work presents a standardised approach to create datasets for Science and Technology Parks (STPs), facilitating future analysis of STP characteristics, trends and performance. STPs are the most representative examples of innovation ecosystems. The ETL (extraction-transformation-load) structure was adapted to a global field study of STPs. A selection stage and quality check were incorporated, and the methodology was applied to Spanish STPs. This study applies diverse techniques such as expert labelling and information extraction which uses language technologies. A novel methodology for building quality and standardised STP datasets was designed and applied to a Spanish STP case study with 49 STPs. An updatable dataset and a list of the main features impacting STPs are presented. Twenty-one (n = 21) core features were refined and selected, with fifteen of them (71.4 %) being robust enough for developing further quality analysis. The methodology presented integrates different sources with heterogeneous information that is often decentralised, disaggregated and in different formats: excel files, and unstructured information in HTML or PDF format. The existence of this updatable dataset and the defined methodology will enable powerful AI tools to be applied that focus on more sophisticated analysis, such as taxonomy, monitoring, and predictive and prescriptive analytics in the innovation ecosystems field.
Published: 2024

16. Exploring Conceptual Metaphor Types in Financial Markets Reporting: Mainstream vs. Social Media

Author: Universidad de Alicante. Departamento de Filología Inglesa, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Martin, Tania Josephine, Abreu Salas, José Ignacio, Gutiérrez, Yoan, Universidad de Alicante. Departamento de Filología Inglesa, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Martin, Tania Josephine, Abreu Salas, José Ignacio, and Gutiérrez, Yoan
Abstract: This study contributes to English for Specific Purposes (ESP) pedagogy by providing an updated examination of conceptual metaphor types (CMTs) employed in financial markets reporting. The investigation delves into the prevalence of CMTs in both social media and mainstream media contexts. Robust patterns were identified to distinguish CMT usage between mainstream and social media by leveraging big-scale data analysis, encompassing 38.6 million documents from The Financial Times, The Wall Street Journal, Twitter and Reddit. The data collection spans fifteen months during the COVID-19 pandemic (January 2020–March 2021), marked by socioeconomic upheaval and coinciding with a surge in retail investors using low or no-cost mobile financial trading applications. Examining the proportion of CMTs reveals that war/combat, markets animate, markets inanimate, and health metaphor types have strong associations with both social and mainstream media. The gambling CMT is predominantly linked to social media. In terms of metaphor density, the results indicate a higher concentration in social media compared to mainstream media. Texts sourced from social media, characterized by greater conciseness, emerge as a potential communication barrier. The findings underscore the importance of incorporating authentic texts from social media into specialized language courses, thus enhancing language learning experiences in the domain of financial markets reporting.
Published: 2024

17. OntoLM: Integrating Knowledge Bases and Language Models for classification in the medical domain

Author: Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Yáñez Romero, Fabio, Montoyo, Andres, Muñoz, Rafael, Gutiérrez, Yoan, Suárez Cueto, Armando, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Yáñez Romero, Fabio, Montoyo, Andres, Muñoz, Rafael, Gutiérrez, Yoan, and Suárez Cueto, Armando
Abstract: Large language models have shown impressive performance in Natural Language Processing tasks, but their black box characteristics render the explainability of the model’s decision difficult to achieve and the integration of semantic knowledge. There has been a growing interest in combining external knowledge sources with language models to address these drawbacks. This paper, OntoLM, proposes a novel architecture combining an ontology with a pre-trained language model to classify biomedical entities in text. This approach involves constructing and processing graphs from ontologies and then using a graph neural network to contextualize each entity. Next, the language model and the graph neural network output are combined into a final classifier. Results show that OntoLM improves the classification of entities in medical texts using a set of categories obtained from the Unified Medical Language System. We can create more traceable natural language processing architectures using ontology graphs and graph neural networks., Los grandes modelos de lenguaje han mostrado un rendimiento impresionante en tareas de Procesamiento del Lenguaje Natural, pero su condición de caja negra hace difícil explicar las decisiones del modelo e integrar conocimiento semántico. Existe un interés creciente en combinar fuentes de conocimiento externas con LLMs para solventar estos inconvenientes. En este artículo, proponemos OntoLM, una arquitectura novedosa que combina una ontología con un modelo de lenguaje pre-entrenado para clasificar entidades biomédicas en texto. El enfoque propuesto consiste en construir y procesar grafos provenientes de una ontología utilizando una red neuronal de grafos para contextualizar cada entidad. A continuación, combinamos los resultados del modelo de lenguaje y la red neuronal de grafos en un clasificador final. Los resultados muestran que OntoLM mejora la clasificación de entidades en textos médicos utilizando un conjunto de categorías obtenidas de Unified Medical Language System. Utilizando grafos de ontologías y redes neuronales de grafos podemos crear arquitecturas de procesamiento de lenguaje natural más rastreables.
Published: 2024

18. A computational ecosystem to support eHealth Knowledge Discovery technologies in Spanish

Author: Piad-Morffis, Alejandro, Gutiérrez, Yoan, Almeida-Cruz, Yudivian, and Muñoz, Rafael
Published: 2020
Full Text: View/download PDF

19. Developing an ontology schema for enriching and linking digital media assets

Author: Gutiérrez, Yoan, Tomás, David, and Moreno, Isabel
Published: 2019
Full Text: View/download PDF

20. A corpus to support eHealth Knowledge Discovery technologies

Author: Piad-Morffis, Alejandro, Gutiérrez, Yoan, and Muñoz, Rafael
Published: 2019
Full Text: View/download PDF

21. EXPLORING CONCEPTUAL METAPHOR TYPES IN FINANCIAL MARKETS REPORTING: MAINSTREAM VS. SOCIAL MEDIA.

Author: Martin, Tania Josephine, Salas, José I. Abreu, and Gutiérrez, Yoan
Published: 2024
Full Text: View/download PDF

22. Multidimensional Data Analysis for Enhancing In-Depth Knowledge on the Characteristics of Science and Technology Parks

Author: Francés, Olga, primary, Abreu-Salas, José, additional, Fernández, Javi, additional, Gutiérrez, Yoan, additional, and Palomar, Manuel, additional
Published: 2023
Full Text: View/download PDF

23. Desarrollo de un modelo de Procesamiento del Lenguaje Natural para la extracción de información en documentos del dominio de la salud

Author: Gutiérrez, Yoan, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Grande Ruiz, Eduardo, Gutiérrez, Yoan, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, and Grande Ruiz, Eduardo
Abstract: En la actualidad existen múltiples modelos de inteligencia artificial centrados en la detección de entidades nombradas, que son capaces de detectar una amplia variedad de aspectos. En este trabajo, se centran esos aspectos a enfermedades raras, detectándolas en textos del ámbito clínico. Todos esos textos clínicos son resúmenes de documentos científicos publicados en PubMed. De las enfermedades, no solo se detectarán sus nombres en sí, sino que se quieren detectar una amplia variedad de aspectos relacionados con esas enfermedades, como por ejemplo, sus causas, tratamientos, diagnósticos... Todos esos aspectos se clasificarán en una serie de categorías. Las anotaciones del modelo se generarán, en primera instancia, de forma automática, usando la herramienta Metathesaurus, contenida dentro de UMLS, un sistema de lenguaje médico. Metathesaurus contiene más de 3 millones de conceptos, siendo la inmensa mayoría del ámbito clínico. Además, cuenta con una serie de categorías ya definidas, y con los conceptos clasificados en estas categorías. Para cada texto, se cuenta con un archivo txt que contiene el texto y un archivo ann que contiene sus anotaciones. Esas anotaciones se encuentran definidas en formato BRAT, un formato de anotación que permite después visualizarlas de forma fácil, modificarlas y crear nuevas. Para cada anotación, se especifica el inicio, final, la categoría a la que pertenece y las palabras o grupos de palabras sobre las que se aplica. Una vez se cuenta con esas anotaciones, es posible revisarlas manualmente para que el corpus sea de la mayor calidad posible, pero al tener una base ya de anotaciones, esta tarea será más ágil. La clasificación que se debe de realizar es compleja, ya que contiene bastantes categorías, además de que cada palabra (o grupos de palabras) pueden pertenecer a la vez a varias clases, por lo que las anotaciones se pueden superponer tanto de forma estricta (mismo inicio y final) como de forma parcial. Para la obtención del modelo
Published: 2023

24. T2KG: Transforming Multimodal Document to Knowledge Graph

Author: Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Galiano Segura, Santiago, Muñoz, Rafael, Gutiérrez, Yoan, Montoyo, Andres, Abreu Salas, José Ignacio, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Galiano Segura, Santiago, Muñoz, Rafael, Gutiérrez, Yoan, Montoyo, Andres, and Abreu Salas, José Ignacio
Abstract: The large amount of information in digital format that exists today makes it unfeasible to use manual means to acquire the knowledge contained in these documents. Therefore, it is necessary to develop tools that allow us to incorporate this knowledge into a structure that is easy to use by both machines and humans. This paper presents a system that can incorporate the relevant information from a document in any format, structured or unstructured, into a semantic network that represents the existing knowledge in the document. The system independently processes from structured documents based on its annotation scheme to unstructured documents, written in natural language, for which it uses a set of sensors that identifies the relevant information and subsequently incorporates it to enrich the semantic network that is created by linking all the information based on the knowledge discovered.
Published: 2023

25. Multidimensional Data Analysis for Enhancing In-Depth Knowledge on the Characteristics of Science and Technology Parks

Author: Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Francés Hernández, Olga, Abreu Salas, José Ignacio, Fernández Martínez, Javier, Gutiérrez, Yoan, Palomar, Manuel, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Francés Hernández, Olga, Abreu Salas, José Ignacio, Fernández Martínez, Javier, Gutiérrez, Yoan, and Palomar, Manuel
Abstract: The role played by science and technology parks (STPs) in technology transfer, industrial innovation, and economic growth is examined in this paper. The accurate monitoring of their evolution and impact is hindered by the lack of uniformity in STP models or goals, and the scarcity of high-quality datasets. This work uses existing terminologies, definitions, and core features of STPs to conduct a multidimensional data analysis that explores and evaluates the 21 core features which describe the key internal factors of an STP. The core features are gathered from a reliable and updatable dataset of Spanish STPs. The methodological framework can be replicated for other STP contexts and is based on descriptive techniques and machine-learning tools. The results of the study provide an overview of the general situation of STPs in Spain, validate the existence and characteristics of three types of STPs, and identify the typical features of STPs. Moreover, the prototype STP can be used as a benchmark so that other STPs can identify the features that need to be improved. Finally, this work makes it possible to carry out classifications of STPs, in addition to prediction and decision making for innovation ecosystems.
Published: 2023

26. T2Know: An Advance Scientific-Tecnical Text Analysis Platform for Trend and Knowledge Extraction Using NLP Techniques

Author: Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Muñoz, Rafael, Gutiérrez, Yoan, Montoyo, Andres, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Muñoz, Rafael, Gutiérrez, Yoan, and Montoyo, Andres
Abstract: The project T2Know presents the use of natural language processing technologies for the creation of a semantic platform of scientific documents via knowledge graphs. This knowledge graph will link relevant parts of each document with those of other documents in such a way that trend analysis and recommendations can be achieved. The goals addressed within the scope of this project include entity recognizers development, profile definition and documents linkage through the use of transformers technologies. As a result, the relevant parts of the documents to be extracted are related not only to the title and affiliation of the authors, but also to article topics such as references, which are also considered relevant parts of the scientific article.
Published: 2023

27. A Review in Knowledge Extraction from Knowledge Bases

Author: Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante. Instituto Universitario de Investigación Informática, Yáñez Romero, Fabio, Montoyo, Andres, Muñoz, Rafael, Gutiérrez, Yoan, Suárez Cueto, Armando, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante. Instituto Universitario de Investigación Informática, Yáñez Romero, Fabio, Montoyo, Andres, Muñoz, Rafael, Gutiérrez, Yoan, and Suárez Cueto, Armando
Abstract: Generative language models achieve the state of the art in many tasks within natural language processing (NLP). Although these models correctly capture syntactic information, they fail to interpret knowledge (semantics). Moreover, the lack of interpretability of these models promotes the use of other technologies as a replacement or complement to generative language models. This is the case with research focused on incorporating knowledge by resorting to knowledge bases mainly in the form of graphs. The generation of large knowledge graphs is carried out with unsupervised or semi-supervised techniques, which promotes the validation of this knowledge with the same type of techniques due to the size of the generated databases. In this review, we will explain the different techniques used to test and infer knowledge from graph structures with machine learning algorithms. The motivation of validating and inferring knowledge is to use correct knowledge in subsequent tasks with improved embeddings.
Published: 2023

28. Generación y pesado de skipgrams y su aplicación al análisis de sentimientos

Author: Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Fernández Martínez, Javier, Gutiérrez, Yoan, Martínez-Barco, Patricio, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Fernández Martínez, Javier, Gutiérrez, Yoan, and Martínez-Barco, Patricio
Abstract: El modelado de skipgrams es una técnica para la generación de términos multi-palabra que conserva parte de la secuencialidad y flexibilidad del lenguaje. Sin embargo, en algunos casos el número de skipgrams generados puede ser excesivo a medida que se aumenta la distancia entre palabras. Además, esta distancia no suele ser tenida en cuenta a la hora de valorar los términos que se generan. En este trabajo proponemos una técnica para la generación y filtrado eficientes de skipgrams y un esquema de pesado que tiene en cuenta la distancia entre los términos, dando más importancia a aquellos más cercanos. Aplicaremos y evaluaremos estas propuestas en la tarea de análisis de sentimientos., Skipgram modelling is a technique for generating multi-word terms that preserves some of the sequentiality and flexibility of the language. However, in some cases the number of skipgrams generated may become excessive as the distance between words increases. Moreover, this distance is often not taken into account when evaluating the terms that are generated. In this paper we propose a technique for efficient skipgram generation and filtering, and a weighing scheme that takes into account the distance between terms, giving more importance to those closer. We will apply and evaluate these proposals in the task of sentiment analysis.
Published: 2023

29. A semantic framework for textual data enrichment

Author: Gutiérrez, Yoan, Vázquez, Sonia, and Montoyo, Andrés
Published: 2016
Full Text: View/download PDF

30. Skipgrams Generation and Weighting and its Application to Sentiment Analysis

Author: Fernández Martínez, Javier, Gutiérrez, Yoan, Martínez-Barco, Patricio, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, and Procesamiento del Lenguaje y Sistemas de Información (GPLSI)
Subjects: Generación de términos, Análisis de sentimientos, Sentiment analysis, Term weighting, Term generation, Skipgrams, Pesado de términos
Abstract: El modelado de skipgrams es una técnica para la generación de términos multi-palabra que conserva parte de la secuencialidad y flexibilidad del lenguaje. Sin embargo, en algunos casos el número de skipgrams generados puede ser excesivo a medida que se aumenta la distancia entre palabras. Además, esta distancia no suele ser tenida en cuenta a la hora de valorar los términos que se generan. En este trabajo proponemos una técnica para la generación y filtrado eficientes de skipgrams y un esquema de pesado que tiene en cuenta la distancia entre los términos, dando más importancia a aquellos más cercanos. Aplicaremos y evaluaremos estas propuestas en la tarea de análisis de sentimientos. Skipgram modelling is a technique for generating multi-word terms that preserves some of the sequentiality and flexibility of the language. However, in some cases the number of skipgrams generated may become excessive as the distance between words increases. Moreover, this distance is often not taken into account when evaluating the terms that are generated. In this paper we propose a technique for efficient skipgram generation and filtering, and a weighing scheme that takes into account the distance between terms, giving more importance to those closer. We will apply and evaluate these proposals in the task of sentiment analysis. Esta investigación ha sido financiada por la Universidad de Alicante, el Ministerio de Ciencia e Innovación de España, la Generalitat Valenciana y el Fondo Europeo de Desarrollo Regional (FEDER) a través de la siguiente financiación: a nivel nacional, se concedieron los proyectos TRIVIAL (PID2021-122263OB-C22), Social-Trust (PDC2022-133146-C22) y CLEARTEXT (TED2021-130707B-I00), financiados por MCIN/AEI/10.13039/501100011033 y European Union NextGenerationEU/PRTR; a nivel regional, la Generalitat Valenciana (Conselleria d’Educació, Investigació, Cultura i Esport), concedió financiación para NL4DISMIS (CIPROM/2021/21). Además, contó con el apoyo de dos acciones COST: CA19134 - “Distributed Knowledge Graphs” y CA19142 - “Leading Platform for European Citizens, Industries, Academia, and Policymakers in Media Accessibility”.
Published: 2023

31. Why are some social-media contents more popular than others? Opinion and association rules mining applied to virality patterns discovery

Author: Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante. Departamento de Ciencias del Mar y Biología Aplicada, Saquete Boró, Estela, Zubcoff, Jose, Gutiérrez, Yoan, Martínez-Barco, Patricio, Fernández Martínez, Javier, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante. Departamento de Ciencias del Mar y Biología Aplicada, Saquete Boró, Estela, Zubcoff, Jose, Gutiérrez, Yoan, Martínez-Barco, Patricio, and Fernández Martínez, Javier
Abstract: Discovering the main features of virality patterns in Twitter is the focus of this research. Five trending topics related to the COVID-19 pandemic were selected for the study, with Spanish as the target language. To carry out the discovery of virality patterns, we applied opinion mining techniques that enable us to structure the information based on the polarity of the messages and the emotions they contain. After transforming the information from an unstructured textual representation to a structured one, data mining techniques were applied, specifically association rules mining. Message patterns with the highest virality (high shares and high likes), and at the same time the most relevant characteristics of the patterns with less impact were extracted. After an exhaustive analysis of the most relevant non-redundant rules, it can be concluded that messages with a high-negative polarity and a very high emotional charge, especially emotions that have intensified with the COVID-19 pandemic, such as fear, sadness, anger and surprise are more likely to go viral in social media. By contrast, messages with little news coverage in the media, few authors, and the absence of surprise are relevant features when it comes to seeing messages with very low dissemination in social media.
Published: 2022

32. Intelligent Ensembling of Auto-ML System Outputs for Solving Classification Problems

Author: Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante. Instituto Universitario de Investigación Informática, Consuegra-Ayala, Juan Pablo, Gutiérrez, Yoan, Almeida-Cruz, Yudivian, Palomar, Manuel, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante. Instituto Universitario de Investigación Informática, Consuegra-Ayala, Juan Pablo, Gutiérrez, Yoan, Almeida-Cruz, Yudivian, and Palomar, Manuel
Abstract: Automatic Machine Learning (Auto-ML) tools enable the automatic solution of real-world problems through machine learning techniques. These tools tend to be more time consuming than standard machine learning libraries, therefore, exploiting all the available resources to the full is a valuable feature. This paper presents a two-phase optimization system for solving classification problems. The system is designed to produce more robust classifiers by exploiting the different architectures that are generated while solving classification problems with Auto-ML tools, particularly AutoGOAL. In the first phase, the system follows a probabilistic strategy to find the best combination of algorithms and hyperparameters to generate a collection of base models according to certain diversity criteria; and in the second, it follows similar Auto-ML strategies to ensemble those models. The HAHA 2019 challenge corpus and the Adult dataset were used to evaluate the system. The experimental results show that: i) a better solution can be built by ensembling a subset of the already tested models; ii) the performance of ensemble methods depends on the collection of base models used; and, iii) ensuring diversity using the double-fault measure produces better results than the disagreement measure. The source code is available online for the research community.
Published: 2022

33. Resumen de la Tarea de Descubrimiento de Conocimiento en Salud en IberLEF 2021

Author: Piad-Morffis, Alejandro, Estévez-Velarde, Suilan, Gutiérrez, Yoan, Almeida-Cruz, Yudivian, Montoyo, Andres, Muñoz, Rafael, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, and Procesamiento del Lenguaje y Sistemas de Información (GPLSI)
Subjects: Tarea, Lenguajes y Sistemas Informáticos, Extracción de Relaciones, Challenge, Knowledge Discovery, Descubrimiento de Conocimiento, Named Entity Recognition, Relation Extraction, Reconocimiento de Entidades Nombradas
Abstract: This paper summarises the eHealth Knowledge Discovery Challenge hosted at IberLEF 2021. We describe the task, resources, and participating systems, highlighting and discussing the main results achieved in the challenge. We analyse the best performing systems and present recommendations for future research. Este artículo resume la Tarea de Descubrimiento de Conocimiento en Salud presentada en IberLEF 2021. Se describen la tarea, los recursos creados, y los sistemas que participaron. Se discuten los resultados principales obtenidos por estos sistemas, y se presentan recomendaciones para continuar la investigación en esta temática. This research has been supported by a Carolina Foundation grant in agreement with University of Alicante and University of Havana. Moreover, the research has been partially funded by the University of Alicante and the University of Havana, the Generalitat Valenciana (Conselleria d'Educació, Investigació, Cultura i Esport) and the Spanish Government through the projects LIVING-LANG (RTI2018-094653-B-C22), INTEGER (RTI2018-094649-B-I00) and SIIA (PROMETEO/2018/089, PROMETEU/2018/089). Additionally, it has been backed by the work of both COST Actions: CA19134 – “Distributed Knowledge Graphs” and CA19142 – “Leading Platform for European Citizens, Industries, Academia and Policymakers in Media Accessibility”.
Published: 2021

34. Analysing the Twitter accounts of licensed Sports gambling operators in Spain: a space for responsible gambling?

Author: Hernández-Ruiz, Alejandra, primary and Gutiérrez, Yoan, additional
Published: 2021
Full Text: View/download PDF

35. Hacia la democratización del aprendizaje de máquinas usando AutoGOAL

Author: Estevanell-Valladares, Ernesto L., Estévez-Velarde, Suilan, Piad-Morffis, Alejandro, Gutiérrez, Yoan, Montoyo, Andres, Almeida-Cruz, Yudivian, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, and Procesamiento del Lenguaje y Sistemas de Información (GPLSI)
Subjects: Artificial intelligence, Automated learning, Machine learning, Lenguajes y Sistemas Informáticos, Aprendizaje de máquinas, Aprendizaje automatizado, AutoML, Inteligencia artificial
Abstract: El aprendizaje automático es un campo de la inteligencia artificial que ha ganado un reciente interés en todas las áreas de la industria, motivado fundamentalmente por el acelerado crecimiento de las capacidades de cómputo y la disponibilidad de datos. Sin embargo, una de las principales dificultades para su aplicación es la necesidad de expertos que conozcan los detalles internos de los múltiples modelos que pueden ser utilizados. En este contexto ha surgido un nuevo campo de estudio, denominado AutoML (Automated Machine Learning), que facilita la utilización de estas técnicas por expertos de otros dominios. Este artículo presenta una propuesta concreta de un sistema —AutoGOAL— que ha sido diseñada para resolver problemas de aprendizaje automático de variada naturaleza. Además, se realiza una breve comparación entre sistemas existentes de relevancia en el campo. La propuesta es competitiva con herramientas del estado del arte en problemas clásicos de aprendizaje, a la vez que puede desplegarse, sin esfuerzo adicional, en dominios más complejos, como el procesamiento de lenguaje natural. AutoGOAL constituye un paso más hacia la democratización del aprendizaje automático para usuarios no expertos en el tema. Machine Learning is a field of Artificial Intelligence that has gained recent interest in all areas of the industry, motivated primarily by the accelerated growth of computer capabilities and data availability. However, one of the main difficulties for its application is the need for experts who know the internal details of the multiple models that can be used. In this context, a new field of study has emerged, AutoML (Automated Machine Learning), which facilitates the use of these techniques by experts from other domains. This paper presents a concrete proposal of a system —AutoGOAL— which has been designed to solve machine learning problems of various kinds. In addition, a brief comparison is made between relevant existing systems in the field. The proposal is competitive with state-of-the-art tools in classic machine learning problems, and it can be seamlessly deployed in more complex domains, such as natural language processing. AutoGOAL is another step towards the democratization of machine learning for non-expert users.
Published: 2021

36. Hacia la democratización del aprendizaje de máquinas usando AutoGOAL

Author: Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Estevanell-Valladares, Ernesto L., Estévez-Velarde, Suilan, Piad-Morffis, Alejandro, Gutiérrez, Yoan, Montoyo, Andres, Almeida-Cruz, Yudivian, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Estevanell-Valladares, Ernesto L., Estévez-Velarde, Suilan, Piad-Morffis, Alejandro, Gutiérrez, Yoan, Montoyo, Andres, and Almeida-Cruz, Yudivian
Abstract: El aprendizaje automático es un campo de la inteligencia artificial que ha ganado un reciente interés en todas las áreas de la industria, motivado fundamentalmente por el acelerado crecimiento de las capacidades de cómputo y la disponibilidad de datos. Sin embargo, una de las principales dificultades para su aplicación es la necesidad de expertos que conozcan los detalles internos de los múltiples modelos que pueden ser utilizados. En este contexto ha surgido un nuevo campo de estudio, denominado AutoML (Automated Machine Learning), que facilita la utilización de estas técnicas por expertos de otros dominios. Este artículo presenta una propuesta concreta de un sistema —AutoGOAL— que ha sido diseñada para resolver problemas de aprendizaje automático de variada naturaleza. Además, se realiza una breve comparación entre sistemas existentes de relevancia en el campo. La propuesta es competitiva con herramientas del estado del arte en problemas clásicos de aprendizaje, a la vez que puede desplegarse, sin esfuerzo adicional, en dominios más complejos, como el procesamiento de lenguaje natural. AutoGOAL constituye un paso más hacia la democratización del aprendizaje automático para usuarios no expertos en el tema., Machine Learning is a field of Artificial Intelligence that has gained recent interest in all areas of the industry, motivated primarily by the accelerated growth of computer capabilities and data availability. However, one of the main difficulties for its application is the need for experts who know the internal details of the multiple models that can be used. In this context, a new field of study has emerged, AutoML (Automated Machine Learning), which facilitates the use of these techniques by experts from other domains. This paper presents a concrete proposal of a system —AutoGOAL— which has been designed to solve machine learning problems of various kinds. In addition, a brief comparison is made between relevant existing systems in the field. The proposal is competitive with state-of-the-art tools in classic machine learning problems, and it can be seamlessly deployed in more complex domains, such as natural language processing. AutoGOAL is another step towards the democratization of machine learning for non-expert users.
Published: 2021

37. PCT Observer Tablero de Parques Científicos/Tecnológicos

Author: Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Francés Hernández, Olga, Abreu Salas, José Ignacio, Gutiérrez, Yoan, Fernández Martínez, Javier, Palomar, Manuel, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Francés Hernández, Olga, Abreu Salas, José Ignacio, Gutiérrez, Yoan, Fernández Martínez, Javier, and Palomar, Manuel
Abstract: PCT Observer es una aplicación web para visualizar y analizar datos relacionados con los parques científicos/tecnológicos. Permite descubrir la existencia de diferencias significativas o relaciones entre los indicadores clave, comparar su evolución en el tiempo, y determinar los indicadores más relevantes para caracterizar los diferentes tipos de parques., PCT Observer is a web application that allows to analyze and visualize key indicators of the scientific/technological parks. It facilitates the user the discovery of statistically significant differences or relationships between indicators, comparing their time series, and exploring the features that better characterize the different park types.
Published: 2021

38. Descubrimiento Automático de Flujos de Aprendizaje de Máquina basado en Gramáticas Probabilı́sticas

Author: Gutiérrez, Yoan, Montoyo, Andres, Almeida-Cruz, Yudivian, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Estévez-Velarde, Suilan, Gutiérrez, Yoan, Montoyo, Andres, Almeida-Cruz, Yudivian, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, and Estévez-Velarde, Suilan
Abstract: El aprendizaje de máquinas ha ganado terreno utilizándose en casi todas las áreas de la vida cotidiana, ayudando a tomar decisiones en las finanzas, la medicina, el comercio y el entretenimiento. El desarrollo continuo de nuevos algoritmos y técnicas de aprendizaje automático, y la amplia gama de herramientas y conjuntos de datos disponibles han traído nuevas oportunidades y desafíos para investigadores y profesionales tanto del mundo académico como de la industria. Seleccionar la mejor estrategia posible para resolver un problema de aprendizaje automático es cada vez más difícil, en parte porque requiere largos tiempos de experimentación y profundos conocimientos técnicos. En este escenario, el campo de investigación Automated Machine Learning (AutoML) ha ganado protagonismo, proponiendo estrategias para automatizar progresivamente tareas usuales durante el desarrollo de aplicaciones de aprendizaje de máquina. Las herramientas de AutoML más comunes permiten seleccionar automáticamente dentro de un conjunto restringido de algoritmos y parámetros la mejor estrategia para cierto conjunto de datos. Sin embargo, los problemas prácticos a menudo requieren combinar y comparar algoritmos heterogéneos implementados con diferentes tecnologías subyacentes. Un ejemplo es el procesamiento del lenguaje natural, un escenario donde varía el espacio de posibles técnicas a aplicar ampliamente entre diferentes tareas, desde el preprocesamiento hasta la representación y clasificación de textos. Realizar AutoML en un escenario heterogéneo como este es complejo porque la solución necesaria podría incluir herramientas y bibliotecas no compatibles entre sí. Esto requeriría que todos los algoritmos acuerden un protocolo común que permita la salida de un algoritmo para ser compartida como entradas a cualquier otro. En esta investigación se diseña e implementa un sistema de AutoML que utiliza técnicas heterogéneas. A diferencia de los enfoques de AutoML existentes, nuestra contribución puede
Published: 2021

39. Ecosistema para el Descubrimiento de Conocimiento en Lenguaje Natural

Author: Gutiérrez, Yoan, Almeida-Cruz, Yudivian, Muñoz, Rafael, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante. Instituto Universitario de Investigación Informática, Piad-Morffis, Alejandro, Gutiérrez, Yoan, Almeida-Cruz, Yudivian, Muñoz, Rafael, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante. Instituto Universitario de Investigación Informática, and Piad-Morffis, Alejandro
Abstract: La creciente cantidad de información publicada en línea presenta un reto significativo para la comunidad científica. La disponibilidad de estos recursos permite acelerar las investigaciones en múltiples ramas de la ciencia, al conectar resultados de diferentes grupos de investigadores. Sin embargo, el volumen de información producido es imposible de procesar por humanos en su totalidad, por lo que la comunidad científica desperdicia tiempo y recursos en redescubrir los mismos resultados, debido a la falta de comunicación. La aplicación de técnicas de inteligencia artificial permite construir sistemas computacionales que ayuden a los investigadores a buscar, analizar y conectar la información existente en grandes volúmenes de datos. Este proceso se denomina descubrimiento automático de conocimiento y es una rama de investigación con un creciente interés. El dominio de la salud es uno de los escenarios en los que el descubrimiento de conocimiento automático puede producir un mayor impacto en beneficio de la sociedad. La reciente pandemia de COVID-19 es un ejemplo donde la producción de artículos científicos ha superado con creces la capacidad de la comunidad científica para asimilarlos. Para mitigar este fenómeno se han publicado recursos lingüísticos que permitan construir sistemas de descubrimiento automático de conocimiento. Sin embargo, el descubrimiento de conocimiento requiere no solo de recursos lingüísticos, sino que necesita recursos computacionales e infraestructura disponibles para evaluar los resultados sistemáticamente y comparar objetivamente enfoques alternativos. Este trabajo describe un ecosistema que facilita la investigación y el desarrollo en el descubrimiento de conocimiento en el dominio biomédico, específicamente en idioma español, aunque puede ser extendido a otros dominios e idiomas. Con este fin, se desarrollan y comparten varios recursos con la comunidad investigadora, incluido un nuevo modelo de anotación semántica, cuatro corpus con más de
Published: 2021

40. Applying Smarta to the analysis of tourist networks

Author: Universidad de Alicante. Departamento de Matemática Aplicada, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Lloret-Climent, Miguel, Nescolarde-Selva, Josué Antonio, Alonso-Stenberg, Kristian, Montoyo, Andres, Gutiérrez, Yoan, Universidad de Alicante. Departamento de Matemática Aplicada, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Lloret-Climent, Miguel, Nescolarde-Selva, Josué Antonio, Alonso-Stenberg, Kristian, Montoyo, Andres, and Gutiérrez, Yoan
Abstract: The framework of the present study was the destination life cycle model, a classical model that describes the development of tourist destinations. We examined mass tourism in Benidorm based on tourist accommodation supply and demand statistics over the January 2016 to October 2018 period, provided by Spain's National Institute for Statistics. The objective was to analyze the life cycle and competitiveness of Benidorm's tourism system and interpret whether the tourism product was sustainable and at what stage in the cycle Benidorm is currently in. To do this, we used Smarta software, which, based on network analysis, enables to interpret the system's virtuous cycles and analyze causality by observing relationship patterns in the system's attractors, thus complementing typical processing based on causal maps and the study of social networks. The results obtained by this application (which has been developed by our research team) show six sets of attractors that mark the trends of the tourist system. Finally, the analysis of the significant variables of these attractors has helped us to justify that the tourist system of Benidorm is in the rejuvenation phase.
Published: 2021

41. Analysing the Twitter accounts of licensed Sports gambling operators in Spain: a space for responsible gambling?

Author: Universidad de Alicante. Departamento de Comunicación y Psicología Social, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Hernández-Ruiz, Alejandra, Gutiérrez, Yoan, Universidad de Alicante. Departamento de Comunicación y Psicología Social, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Hernández-Ruiz, Alejandra, and Gutiérrez, Yoan
Abstract: Apart from the economic impact of the online gambling industry, the social, public order and health-related consequences of the industry merit analysis to inform appropriate action, regulatory or otherwise. The omnipresence of ICTs, the inability to use technologies properly, along with the growth of online gambling channels, have acted simultaneously as a catalyst for the spread of pathological and problematic gambling. In this context, social networks have become a highly effective platform to instil positive attitudes towards the products of gambling operators. This work uses the Natural Language Processing based web application “GPLSI Social Analytics” to track, in real time, the conversations generated on Twitter about the Spanish domain accounts of the main online sports gambling operators. The findings indicate that most of the messages about these operators are positive and surprise is the predominant emotion associated with them. The notion of responsible online gambling barely receives a mention in the conversations analysed. Given the role of new technologies as access facilitators and potential enhancers of addictive behaviours, it is necessary to adopt measures directed at social networks that guarantee the coexistence of the right to freedom of expression with the protection of the most vulnerable populations.
Published: 2021

42. Overview of the eHealth Knowledge Discovery Challenge at IberLEF 2021

Author: Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Piad-Morffis, Alejandro, Estévez-Velarde, Suilan, Gutiérrez, Yoan, Almeida-Cruz, Yudivian, Montoyo, Andres, Muñoz, Rafael, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Piad-Morffis, Alejandro, Estévez-Velarde, Suilan, Gutiérrez, Yoan, Almeida-Cruz, Yudivian, Montoyo, Andres, and Muñoz, Rafael
Abstract: This paper summarises the eHealth Knowledge Discovery Challenge hosted at IberLEF 2021. We describe the task, resources, and participating systems, highlighting and discussing the main results achieved in the challenge. We analyse the best performing systems and present recommendations for future research., Este artículo resume la Tarea de Descubrimiento de Conocimiento en Salud presentada en IberLEF 2021. Se describen la tarea, los recursos creados, y los sistemas que participaron. Se discuten los resultados principales obtenidos por estos sistemas, y se presentan recomendaciones para continuar la investigación en esta temática.
Published: 2021

43. Automatic extension of corpora from the intelligent ensembling of eHealth knowledge discovery systems outputs

Author: Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Consuegra-Ayala, Juan Pablo, Gutiérrez, Yoan, Piad-Morffis, Alejandro, Almeida-Cruz, Yudivian, Palomar, Manuel, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Consuegra-Ayala, Juan Pablo, Gutiérrez, Yoan, Piad-Morffis, Alejandro, Almeida-Cruz, Yudivian, and Palomar, Manuel
Abstract: Corpora are one of the most valuable resources at present for building machine learning systems. However, building new corpora is an expensive task, which makes the automatic extension of corpora a highly attractive task to develop. Hence, finding new strategies that reduce the cost and effort involved in this task, while at the same time guaranteeing quality, remains an open and important challenge for the research community. In this paper, we present a set of ensembling strategies oriented toward entity and relation extraction tasks. The main goal is to combine several automatically annotated versions of corpora to produce a single version with improved quality. An ensembler is built by exploring a configuration space in search of the combination that maximizes the fitness of the ensembled collection according to a reference collection. The eHealth-KD 2019 challenge was chosen for the case study. The submitted systems’ outputs were ensembled, resulting in the construction of an automatically annotated collection of 8000 sentences. We show that using this collection as additional training input for a baseline algorithm has a positive impact on its performance. Additionally, the ensembling pipeline was used as a participant system in the 2020 edition of the challenge. The ensembled run achieved a slightly better performance than the individual runs.
Published: 2021

44. GPLSI FieroBoT: Asistente virtual para la captación de insultos

Author: Universidad de Alicante. Instituto Universitario de Investigación Informática, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Botella Gil, Beatriz, Gutiérrez, Yoan, Martínez-Barco, Patricio, Palomar, Manuel, Universidad de Alicante. Instituto Universitario de Investigación Informática, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Botella Gil, Beatriz, Gutiérrez, Yoan, Martínez-Barco, Patricio, and Palomar, Manuel
Abstract: FieroBoT es un asistente virtual orientado a la recopilación de insultos desde la aplicación de mensajería Telegram. A través del sistema de mensajería de Telegram los usuarios pueden libremente tener una conversación con el robot donde se incita a que lo insultemos. El proceso es el siguiente, en primer lugar, se nos preguntará nuestro sexo y la edad y a continuación podremos insultar al robot. Solo se almacenará el contenido del mensaje y los datos demográficos genéricos como rango de edad y sexo con el fin de organizar el contenido. Los insultos recolectados servirán para la creación de un recurso de palabras y expresiones violentas que podrá ser utilizado para distintos estudios relacionados con el lenguaje popular., FieroBoT is a virtual assistant aimed at collecting insults from users of messaging application Telegram. Users can have an anonymous and gratis conversation with the robot whereby, they are prompted to insult it. The process involves asking the user their gender and age after which, they are free to insult the robot. Only the content of the message and generic demographic data such as age range and gender will be stored to categorize the content. The collected insults will be used to create a resource of violent words and expressions that can be used for different studies related to vernacular insults.
Published: 2021

45. GPLSI-UH LETO V1.0: Motor de aprendizaje a través de ontologías

Author: Universidad de Alicante. Instituto Universitario de Investigación Informática, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Estévez-Velarde, Suilan, Piad-Morffis, Alejandro, Gutiérrez, Yoan, Montoyo, Andres, Muñoz, Rafael, Almeida-Cruz, Yudivian, Palomar, Manuel, Valdés Pérez, Daniel Alejandro, Universidad de Alicante. Instituto Universitario de Investigación Informática, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Estévez-Velarde, Suilan, Piad-Morffis, Alejandro, Gutiérrez, Yoan, Montoyo, Andres, Muñoz, Rafael, Almeida-Cruz, Yudivian, Palomar, Manuel, and Valdés Pérez, Daniel Alejandro
Abstract: LETO es un marco de aprendizaje de ontologías diseñado para extraer conocimiento de una variedad de fuentes. Estas fuentes pudieran ser datos estructurados y no estructurados, y de ellas se podrá descubrir, actualizar continuamente, enriquecer e integrar información relevante como parte de un único conocimiento semántico. En la actual versión 1.0 se limita a la extracción de conocimiento desde datos no estructurados, i.e. textos en lenguaje natural, siguiendo el modelo semántico publicado en [EGM2018]. Entre sus funcionalidades está la extracción de entidades y relaciones semánticas desde fuentes textuales; la transformación de esta información en elementos interrelacionados mediante técnicas de agrupamientos; y finalmente generación de ontologías representativas del contenido procesado. Se proporciona un punto de acceso API, y una herramienta visual para la manipulación de procesos y visualización de las ontologías obtenidas [EMA2019]., LETO is an ontology learning framework designed to extract knowledge from a variety of sources. These sources may be structured and/or unstructured data, and from them we can discover, continuously update, enrich and integrate relevant information as part of a single semantic knowledge resource. The current 1.0 version is limited to the extraction of knowledge from unstructured data, i.e. natural language texts, following the semantic model published in [EGM2018]. Among this version’s functionalities are the extraction of entities and semantic relations from textual sources; the transformation of such information into linked elements through clustering techniques; and finally, the generation of representative ontologies of the processed content. An API access point as well as a visual tool for the manipulation of processes and visualization of the obtained ontologies is provided [EMA2019].
Published: 2021

46. General-purpose hierarchical optimisation of machine learning pipelines with grammatical evolution

Author: Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Estévez-Velarde, Suilan, Gutiérrez, Yoan, Almeida-Cruz, Yudivian, Montoyo, Andres, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Estévez-Velarde, Suilan, Gutiérrez, Yoan, Almeida-Cruz, Yudivian, and Montoyo, Andres
Abstract: This paper introduces Hierarchical Machine Learning Optimisation (HML-Opt), an AutoML framework that is based on probabilistic grammatical evolution. HML-Opt has been designed to provide a flexible framework where a researcher can define the space of possible pipelines to solve a specific machine learning problem, which can range from high-level decisions about representation and features to low-level hyper-parameter values. The evaluation of HML-Opt is presented via two different case studies, both of which demonstrate that it is competitive with existing AutoML tools on a variety of benchmarks. Furthermore, HML-Opt can be applied to novel problems, such as knowledge extraction from natural language text, whereas other techniques are insufficiently flexible to capture the complexity of these scenarios. The source code for HML-Opt is available online for the research community.
Published: 2021

47. Overview of the eHealth Knowledge Discovery Challenge at IberLEF 2020

Author: Piad-Morffis, Alejandro, Gutiérrez, Yoan, Cañizares-Diaz, Hian, Estévez-Velarde, Suilan, Muñoz, Rafael, Montoyo, Andres, Almeida-Cruz, Yudivian, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante. Instituto Universitario de Investigación Informática, and Procesamiento del Lenguaje y Sistemas de Información (GPLSI)
Subjects: Machine Learning, Lenguajes y Sistemas Informáticos, eHealth, Knowledge Discovery, Natural Language Processing
Abstract: This paper summarises the results of the third edition of the eHealth Knowledge Discovery (KD) challenge, hosted at the Iberian Language Evaluation Forum 2020. The eHealth-KD challenge proposes two computational tasks involving the identification of semantic entities and relations in natural language text, focusing on Spanish language health documents. In this edition, besides text extracted from medical sources, Wikipedia content was introduced into the corpus, and a novel transfer-learning evaluation scenario was designed that challenges participants to create systems that provide cross-domain generalisation. A total of eight teams participated with a variety of approaches including deep learning end-to-end systems as well as rule-based and knowledge-driven techniques. This paper analyses the most successful approaches and highlights the most interesting challenges for future research in this field. This research has been partially supported by the University of Alicante and University of Havana, the Generalitat Valenciana (Conselleria d’Educació, Investigació, Cultura i Esport) and the Spanish Government through the projects SIIA (PROMETEO/2018/089, PROMETEU/2018/089) and LIVING-LANG (RTI2018-094653-B-C22).
Published: 2020

48. GPLSI AitanaWEB. Asistente Virtual sobre procesos de matriculación académica

Author: Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Gutiérrez, Yoan, Abreu Salas, José Ignacio, Montoyo, Andres, Muñoz, Rafael, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Gutiérrez, Yoan, Abreu Salas, José Ignacio, Montoyo, Andres, and Muñoz, Rafael
Abstract: AitanaWEB es un chatbot para la asistencia telemática a usuarios sobre los procesos de matriculación académica y cuestiones relacionadas. Ofrece información sobre temas como los Horarios, Notas de Corte, Matrícula, Traslado de Expediente entre otros. Esta diseñado orientado a la accesibilidad y la usabilidad, posibilitando el acceso en Valenciano y Castellano. Permite el acceso desde diferentes navegadores, como Chrome, Firefox, Edge, Chrome para Android y Safari. Incorpora un narrador y facilidades de reconocimiento del habla permitiendo al usuario realizar la interacción con el sistema mediante voz. El proyecto cuenta de dos componentes básicos: (i) uso de DialogFlow de Google como servicio de Inteligencia Artificial donde se estructuran y entrenan las preguntas y respuestas. (ii) componente de desarrollo propio que hace de controlador, intérprete y enrutador entre DialogFlow y la interfaz de usuario final.
Published: 2020

49. Propuesta metodológica para el desarrollo de sistemas automáticos de evaluación cualitativa en el dominio educativo

Author: Tomás, David, Gutiérrez, Yoan, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Herrera-Flores, Boris, Tomás, David, Gutiérrez, Yoan, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, and Herrera-Flores, Boris
Abstract: Esta tesis discute la importancia de la evaluación educativa y propone una metodología para la captura de opiniones de manera no tradicional, planteando un marco de trabajo que permita obtener de manera más rápida el sentir del estudiante hacia la oferta académica que le brinda la institución educativa, bajo un conjunto más amplio de matices emocionales que las encuestas tradicionales no arrojan. Esta metodología para la obtención de datos cuantitativos y cualitativos está basada en la aplicación de técnicas de Procesamiento del Lenguaje Natural (PLN), proponiendo la captura y recogida de datos utilizando técnicas automáticas para el análisis para evaluar el desempeño de los datos obtenidos, que alimentaron un sistema de análisis de sentimientos basado en aprendizaje automático. Como se ha mencionado, la metodología propuesta permite adquirir un corpus formado por opiniones de estudiantes en el dominio específico de la educación, para de esta manera entrenar un sistema de análisis de sentimientos para la captura precisa de opiniones sobre diferentes aspectos relacionados con el ámbito educativo. Este tipo de propuesta metodológica resulta especialmente relevante en los países de Latinoamérica, donde la evaluación del profesorado es un proceso reciente que necesita de un tiempo de prueba para determinar su alcance. Los resultados de aplicar la metodología propuesta ayudan a la toma de decisiones en la institución educativa donde se utilice, provocando una gobernanza asistida con técnicas computacionales, coherentes con la exigencia de calidad en la educación.
Published: 2020

50. Automatic Discovery of Heterogeneous Machine Learning Pipelines: An Application to Natural Language Processing

Author: Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Estévez-Velarde, Suilan, Gutiérrez, Yoan, Montoyo, Andres, Almeida-Cruz, Yudivian, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Estévez-Velarde, Suilan, Gutiérrez, Yoan, Montoyo, Andres, and Almeida-Cruz, Yudivian
Abstract: This paper presents AutoGOAL, a system for automatic machine learning (AutoML) that uses heterogeneous techniques. In contrast with existing AutoML approaches, our contribution can automatically build machine learning pipelines that combine techniques and algorithms from different frameworks, including shallow classifiers, natural language processing tools, and neural networks. We define the heterogeneous AutoML optimization problem as the search for the best sequence of algorithms that transforms specific input data into the desired output. This provides a novel theoretical and practical approach to AutoML. Our proposal is experimentally evaluated in diverse machine learning problems and compared with alternative approaches, showing that it is competitive with other AutoML alternatives in standard benchmarks. Furthermore, it can be applied to novel scenarios, such as several NLP tasks, where existing alternatives cannot be directly deployed. The system is freely available and includes in-built compatibility with a large number of popular machine learning frameworks, which makes our approach useful for solving practical problems with relative ease and effort.
Published: 2020

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

212 results on '"Gutiérrez, Yoan"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources