131 results on '"Arias Vicente, Marta"'
Search Results
2. Causal discovery and prediction: methods and algorithms
- Author
-
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Gavaldà Mestre, Ricard, Arias Vicente, Marta, Blondel, Gilles, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Gavaldà Mestre, Ricard, Arias Vicente, Marta, and Blondel, Gilles
- Abstract
(English) This thesis focuses on the discovery of causal relations and on the prediction of causal effects. Regarding causal discovery, this thesis introduces a novel and generic method to learn causal graphs by performing a sequence of interventions, where each intervention is applied on a single value of the intervened variables, while minimizing the overall cost of the sequence of intervened and observed variables during the discovery process. Regarding causal effect prediction, this thesis introduces a comprehensive causal reasoning method for models recurrent in time. In this thesis, all causal models are assumed to contain hidden confounders that have an influence on observed variables in the causal model, except when explicitly referring to causal models without hidden confounders as a sub-case. Also all variables are assumed to be in a finite domain. Contributions to the Discovery of Causal Relations Our method for the discovery of causal relations introduces several novelties. Firstly, we use interventions on a single value of the intervened variables. All previous methods require interventions on several values of the intervened variables in order to measure correlation or conditional independence among variables. By seeing do-calculus as a tool to predict systematically and numerically the effect of all the interventions that are possible, without having to actually perform them, we have moved the search space out of the real world, and eliminated the need for systematic correlation and independence testing in the real world. We assume that computational cost is not a concern, if we compare it with the cost of actually experimenting in the real world. Secondly, we accept any set of candidate graphs as input to our method. Previous knowledge may or may not be in the form of an equivalence class of graphs, and the set of candidate graphs may or may not have any particular parametrical characteristic. Some candidate graphs may have been discarded previously, (Español) Esta tesis estudia el aprendizaje de relaciones causales y la predicción de efectos causales. En cuanto al aprendizaje de relaciones causales, esta tesis presenta un método novedoso y genérico para aprender grafos causales mediante la realización de una secuencia de intervenciones, donde cada intervención se aplica sobre un único valor de las variables intervenidas, minimizando el coste total de la secuencia de variables intervenidas y observadas durante el proceso de aprendizaje. En cuanto a la predicción de efectos causales, esta tesis introduce un método de razonamiento causal para modelos recurrentes en el tiempo. En esta tesis, asumimos que los modelos causales contienen variables ocultas que tienen influencia sobre las variables observadas del modelo, excepto cuando se hace referencia explícita a modelos causales sin variables ocultas como subcaso. También asumimos que todas las variables están en un dominio finito. Contribución al aprendizaje de relaciones causales Nuestro método para el aprendizaje de relaciones causales introduce varias novedades. En primer lugar, utilizamos intervenciones sobre un único valor de las variables intervenidas. Todos los métodos anteriores requieren intervenciones sobre varios valores de las variables intervenidas para medir la correlación o la independencia condicional entre variables. Al utilizar el do-calculus como una herramienta para predecir sistemática y numéricamente el efecto de todas las intervenciones que son posibles, sin tener que realizarlas en la realidad, trasladamos el espacio de búsqueda fuera del mundo real y eliminamos la necesidad de correlación sistemática y pruebas de independencia condicional en el mundo real. Asumimos que disponemos de recursos computacionales ilimitados, y que disponer de ellos es preferible al costo de experimentar en el mundo real. En segundo lugar, aceptamos cualquier conjunto de grafos candidatos. El conocimiento previo de algunas partes del modelo puede tener o no la for, Postprint (published version)
- Published
- 2023
3. ML empowered vulnerability pattern detection
- Author
-
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Calvo Ibáñez, Albert, Arias Vicente, Marta, Sánchez i Casals, Alexandre, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Calvo Ibáñez, Albert, Arias Vicente, Marta, and Sánchez i Casals, Alexandre
- Abstract
This report presents SIEVA, an AI-powered software that utilizes internet traffic logs to provide a taxonomy for classifying cyberattacks and intrusions using the MITRE framework. The software is divided into three parts: a database for storing logs, an AI engine for classifying and mapping the logs, and a graphical user interface for visualizing the results. The AI engine will implement a natural language processing machine learning classifier to classify the logs, and experimentation with name entity recognition techniques will be conducted to gain a better understanding of the logs. The ultimate goal of the project is to provide a user-friendly and efficient way for visualizing potential threats to a network using internet traffic logs.
- Published
- 2023
4. Optimizing energy market participation with batteries
- Author
-
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, i.LECO, Arias Vicente, Marta, Mihaylov, Mihail, Cheaib, Alaa, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, i.LECO, Arias Vicente, Marta, Mihaylov, Mihail, and Cheaib, Alaa
- Abstract
Due to the fact that the energy sector is in transition, there are goals for lowering the energy cost with the use of renewables and batteries. This presents challenges to the system and the solution is the issuing of energy communities that can be used to make electricity provision more clean and secure. It is also to see how energy flexibility elements or elements on the consumption side can make the system more efficient and cheaper, which is being done in this paper concerning the day-ahead bid and batteries. Traditional day-ahead bidding methods have become costly, mainly when the forecasted energy consumption differs from the actual consumption, which has to be resolved by penalizing with an imbalance cost. This thesis is part of a more significant project (Layered Energy System) that is to be deployed in Spain. Applying such changes to the electricity system first requires becoming familiar with and understanding Spain's context. The first part of this thesis provides research to understand the Spanish regulatory framework, how the market works, and the status of these technologies in Spain. Following that, this thesis's primary work is to explore how day-ahead market bid could be improved through the use of batteries for better planning and error assumptions. It mentions several day-ahead bidding strategies in the context of energy and batteries. And then selects a subset (three) of the studied strategies and implements them, comparing their performance on actual electricity data. Finally, selects the one that best fits various scenarios and requirements. A particular objective function is opted to be minimized with respect to the battery constraints that involve the variables. A linear program will find the values that best fits those variables at every time step $t$ of a single day. The methodology is an improvement over traditional predictive models. After comparing different strategies, Results show that strategy one, namely "Stochastic Chance-constraint
- Published
- 2021
5. Teex: a toolbox for the evaluation of explanations
- Author
-
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Yunzhe Jia, Arias Vicente, Marta, Antoñanzas Acero, Jesús Maria, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Yunzhe Jia, Arias Vicente, Marta, and Antoñanzas Acero, Jesús Maria
- Abstract
In the machine learning (ML) community, models are developed, trained and deployed for many applications. Text-to-speech, product and media recommendation, medical aiding, environmental protection and many more are examples of current ML applications. But, more often than not, given the quality requirements for the applications, these models can become very complex. So complex, in fact, that the decisions they take are usually not understandable by humans. These are called black box models. So, given the clear problem of not trusting models' decisions because of the rele- vance of their impact and their low transparency, explanation methods / explainers were born with the objective of distilling the factors that black box models take into account when making decisions into 'explanations', which humans can understand. There are many categorizations into which explanation methods fall. For example, the type of explanations they produce, on which models do they work, their mechanisms for extracting information or if they try to characterize a model's whole behaviour (global explanations) or individual predictions (local explanations). Given the current rise of the field of Explainable AI (XAI), which is driven by necessity, researchers need a tool to easily and swiftly evaluate the performance of state-of-the-art explainer methods. On top of current evaluation techniques such as performing subjective human experiments or manually comparing the quality of explanations, we present a toolbox that will allow to add another layer of credibility to part of XAI research. The toolbox is aimed at the automatic evaluation of local explanations via comparison to ground-truth explanations. Version 1.0 contains several evaluation metrics for different explanation types: saliency maps, decision rules and feature and word importance vectors. Moreover, the library also provides real-world and artificial data with available ground truth explanations so that users can easily benchmark l
- Published
- 2021
6. Extracting information from images to improve real estate marketplaces' experience
- Author
-
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Arias Vicente, Marta, Benchekroun, Youssef, Bosch i Mustarós, Eduard, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Arias Vicente, Marta, Benchekroun, Youssef, and Bosch i Mustarós, Eduard
- Published
- 2021
7. Feature engineering, dimensionality reduction and interpretability through autoencoders for structured data
- Author
-
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Arias Vicente, Marta, Arratia Quesada, Argimiro Alejandro, Bofarull Cabello, Antoni, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Arias Vicente, Marta, Arratia Quesada, Argimiro Alejandro, and Bofarull Cabello, Antoni
- Abstract
Machine Learning is the area of Artificial Intelligence where algorithms learn from data. Therefore, making a good selection of features is essential for the models to perform their tasks in the best possible way. We employ a denoising autoencoder architecture and extend it to take advantage of the aggregation of features from different contexts using several dilated convolutions. We apply sparse group Lasso regularization to cluster them and automatically identify which ones are the most relevant. In addition to bottleneck neurons to determine if we can further reduce the dimensionality. Besides reconstruction, we include an extra output from the bottleneck that performs classification. Multi-task learning leverages context-specific information that improves the quality of the encoding. Deep Learning models have always been commonly considered black-boxes. However, due to the significant difference in performance compared to interpretable models such as linear regression, it has not been a problem in contexts where understanding the models is not as relevant as obtaining good results. In this project, we study the interpretability of models by using the Shapley value method and its extensions. In the practical part, we have empirically studied the proposed model. The results show that the network architecture can identify the most relevant dilation. On the one hand, we can perform a global interpretation of the model by looking at the weights as we do in linear regression. The advantage over other models is that we group the weights by kernels of the dilated convolutions. On the other hand, through the input-output importance matrix using Shapley Values, we can identify which parts of an instance are most relevant to reconstruct its output.
- Published
- 2021
8. Deep learning based Recommender System for an online retailer
- Author
-
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Arratia Quesada, Argimiro Alejandro, Arias Vicente, Marta, Breve Ramírez, Manuel Alejandro, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Arratia Quesada, Argimiro Alejandro, Arias Vicente, Marta, and Breve Ramírez, Manuel Alejandro
- Abstract
Since Wide and Deep Learning for Recommender Systems appeared in 2016, multiple architecture models have been created around this idea of jointly train a wide and deep neural networks as this architecture allow the model to learn both memorization and generalization, which are critical for recommender systems. It may be possible that these kind of architecture change forever the way recommendation systems predict the preference of a user with respect to an item? In the spirit of answering this question from our own experience, we explore, design, and reproduce a deep learning-based model recommender system, and trained it with the Camper's e-commerce dataset. We wanted to validate in our own experience how good a wide and deep model can be, and how much could improve the accuracy of different baseline models. We have explored two different experiments. The first model was trained to predict the potential rating with which a user would evaluate his preference for a certain category of shoes, on a [1,5] scale, whereas the second model was trained to determine whether a user would, or would not, have a interaction with a specific category of shoes. Our experiment's results reveal that wide and deep models present slightly better but similar performance with respect to other deep learning models, however, for small to medium size dataset instances, or for those datasets that do not have the most suitable feature variables for a recommendation problem, then it would be better to use classic algorithms. Wide and deep models have a nice theoretical basis, but in practice the results only improve under certain circumstances, and with huge instances of data, even so the improvement could not be that significant. Our results are an invitation to don't neglect or ignore the nature of the data. Although deep learning models are considerably improving multiple algorithms, they do not always perform better than simpler and well-known machine learning models which require less dat
- Published
- 2021
9. Smart rehabilitation
- Author
-
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Arias Vicente, Marta, Perez-Uribe, Andrés, Sendino García, Víctor, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Arias Vicente, Marta, Perez-Uribe, Andrés, and Sendino García, Víctor
- Abstract
This thesis is born from a collaboration project between the HEIG-VD and the CHUV hospital in Lausanne, Switzerland. We study the problem of human grasp recognition from first-person RGB video input data. Grasping is the action of seizing and holding firmly an object and there exist many different types. The objective is to use grasp recognition for automating the monitoring of the rehabilitation sessions of patients with upper-limb neurological disorders. We compared three different approaches based on Deep Learning. Firstly, a naive image model that is trained with the entire images. Secondly, a video model, so apart from the spatial features it also takes advantage of the temporal dimension. Lastly, an image model that is trained with images cropped around the hands, so it focuses only on the part that determines the grasp. We used the Yale Grasping Dataset for training the models. To enhance the interpretability of the results we proposed a coarse-grained grasp grouping based on the Feix grasp taxonomy. We also captured our own small first-person video grasp dataset to test the applicability of the models to our setup, which differs from the training dataset in the camera location and angle. Considering the intrinsic challenges of the data such as the frequent hand-object occlusions or the dataset difficulties like its real-world setting and the low video quality, the results are relatively good. Nevertheless, they are insufficient for deploying a satisfactory system at the hospital and remark the difficulty of grasp recognition from just egocentric RGB data. It would be interesting to further research other data modalities such as depth data or to study the problem from the perspective of hand pose estimation and object detection. It is also clear that the field lacks a more modern and large dataset.
- Published
- 2021
10. A study of Deep Learning techniques for sequence-based problems
- Author
-
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Arias Vicente, Marta, Arratia Quesada, Argimiro Alejandro, Quintana Valenzuela, Diego, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Arias Vicente, Marta, Arratia Quesada, Argimiro Alejandro, and Quintana Valenzuela, Diego
- Abstract
Transformer Networks are a new type of Deep Learning architecture first introduced in 2017. By only applying attention mechanisms, the transformer network can model relations between text sequences that outperformed other models in natural language processing tasks, such as language translation. In this work, we explore the capabilities of the transformer architecture to model sub-sequences of a time series, and we use this model to produce forecasts of longer horizons. We implement a transformer network model on a time series dataset that describes the daily aggregated sales of Camper, a shoes and apparel store. This model aims to capture the relation between two sub-sequences from the series and produces a forecast of a third sub-sequence in the future. We explore the different parts of the model and their relation to its performance, as well as the impact of modifying the shape of the input sequences used in training and inference. We use this model to forecast one year of data, and we compare these results with those of other, more classical approaches frequently used in time series forecasting, such as Autoregressive Integrated Moving Average (ARIMA) and Long-Short Term Memory (LSTM) networks. We further examine the capabilities of the model to exploit other features from the dataset, such as descriptors of the sales and temporal features from the target. Finally, we look at the attention maps produced by the attention mechanism implemented in the model and discuss its capability to explain the forecasts it produces. Our implementation shows that the model can exploit temporal features and produce forecasts that improve the proposed benchmarks in most scenarios, and that the attention plots produced provide some explainability guidelines that could be further explored.
- Published
- 2021
11. Weakly-supervised object detection using explanation methods
- Author
-
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Arias Vicente, Marta, Lim Jin Sean, Nick, Adell Ripollés, Víctor, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Arias Vicente, Marta, Lim Jin Sean, Nick, and Adell Ripollés, Víctor
- Abstract
In this thesis we explore the object detection task with weak supervision. We propose and evaluate an alternative method to generate bounding boxes directly from image explanations on architectures based on both convolutions and transformers that does not rely on object proposals and is more efficient in terms of memory consumption than the current state of the art. Finally, motivated by a use case with environmental data we explore an architecture based on vision transformers that does not require any kind of labels., En esta tesis exploramos la tarea de detecci´on de objetos con supervisi´on d´ebil. Proponemos y evaluamos un m´etodo alternativo para generar bounding boxes directamente a partir de descripciones de imágenes en arquitecturas basadas tanto en convoluciones como en transformers, que no hace uso de propuestas de objetos y es m´as eficiente en t´erminos de consumo de memoria que los mejores algoritmos actuales. Finalmente, motivados por un caso de uso con datos ambientales, exploramos una arquitectura basada en transformers de visi´on que no requiere ningún tipo de etiquetas., En aquest treball s'explora la tasca de detecció d'objectes amb supervisió feble. Proposem i evaluem un mètode alternatiu per generar bounding boxes directament a partir de descripcions d'imatges en arquitectures basades tant en convolucions com en transformers, que no fa ús de propostes d'objectes i és més eficient en termes de consum de memòria respecte als millors algoritmes actuals. Finalment, motivats per un cas d'ús amb dades ambientals, explorem una arquitectura basada en transformers de visió que no requereix cap tipus d'etiquetes.
- Published
- 2021
12. Data engineering for cost reduction, efficiency improvement, and business intelligence for an e-commerce company
- Author
-
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Arias Vicente, Marta, Puig Ramirez, Joaquim, Carrillo Alza, Alex, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Arias Vicente, Marta, Puig Ramirez, Joaquim, and Carrillo Alza, Alex
- Published
- 2021
13. Análisis de expectativas : Plan de Comunicación de una empresa
- Author
-
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Arias Vicente, Marta, Montes Martínez, José Luis, Yang, Ying Ana, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Arias Vicente, Marta, Montes Martínez, José Luis, and Yang, Ying Ana
- Published
- 2021
14. Data analysis of socio-economic and financial factors from a public world-wide source
- Author
-
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Arias Vicente, Marta, Calvo Fantova, Santiago, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Arias Vicente, Marta, and Calvo Fantova, Santiago
- Abstract
Many socio-economic studies are nowadays trying to accomplish a complete description of how the different elements of our society and world are connected. This work is an attempt to build an architecture that provides an explanation of the connections and impacts that exist among the different indicators (for now on also mentioned as sectors) of a state or population (Agriculture, Climate Change, Economy & Growth, Energy & Mining, Education, Health, Poverty, Science & Technology, Social Development, and others). We will focus our effort in the research of, not only the correlations that may exist between these indicators and thought the different countries analyzed, but also the causality that relates them. With causality (we will deploy a Bayesian Network architecture for each country to accomplish this task), we will be able to describe the impact and influence that one indicator may have in the others. This could lead to an accurate, powerful and global knowledge of the functioning of our world and each single country in particular, along with a vision of the dependencies between the different indicators that describe a country. Finally, we will also propose a clustering model where each individual will be a representation of the Bayesian Network obtained for each country. With this model, we will provide N aggrupation of countries with their Bayesian Network representation for each one, which will give us a global vision of the functioning of our world represented by the causal relationships between the different indicators that can be found in our countries or populations
- Published
- 2020
15. Knowledge-based segmentation to improve accuracy and explainability in non-technical losses detection
- Author
-
Universitat Politècnica de Catalunya. Doctorat en Computació, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. ALBCOM - Algorismia, Bioinformàtica, Complexitat i Mètodes Formals, Universitat Politècnica de Catalunya. LARCA - Laboratori d'Algorísmia Relacional, Complexitat i Aprenentatge, Calvo Ibáñez, Albert, Coma Puig, Bernat, Carmona Vargas, Josep, Arias Vicente, Marta, Universitat Politècnica de Catalunya. Doctorat en Computació, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. ALBCOM - Algorismia, Bioinformàtica, Complexitat i Mètodes Formals, Universitat Politècnica de Catalunya. LARCA - Laboratori d'Algorísmia Relacional, Complexitat i Aprenentatge, Calvo Ibáñez, Albert, Coma Puig, Bernat, Carmona Vargas, Josep, and Arias Vicente, Marta
- Abstract
Utility companies have a great interest in identifying energy losses. Here, we focus on Non-Technical Losses (NTL), which refer to losses caused by utility theft or meter errors. Typically, utility companies resort to machine learning solutions to automate and optimise the identification of such losses. This paper extends an existing NTL-detection framework: by including knowledge-based NTL segmentation, we have detected some opportunities for improving the accuracy and the explanations provided to the utility company. Our improved models focus on specific types of NTL and therefore, the explanations provided are easier to interpret, allowing stakeholders to make more informed decisions. The improvements and results presented in the article may benefit other industrial frameworks., This work has been supported by MINECO and FEDER funds under grants TIN2017-86727-C2-1-R and TIN2017-89244-R, the recognition 2017SGR-856 (MACDA) from AGAUR (Generalitat de Catalunya)., Peer Reviewed, Objectius de Desenvolupament Sostenible::9 - Indústria, Innovació i Infraestructura, Postprint (published version)
- Published
- 2020
16. Disseny i implementació d'un sistema ETL en el context d'una fintech
- Author
-
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Arias Vicente, Marta, Lao Monreal, Sergi, Gomez Esteve, Miquel, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Arias Vicente, Marta, Lao Monreal, Sergi, and Gomez Esteve, Miquel
- Published
- 2020
17. Massive data processing for data analysis and visualization to help understand sector trends and monitor sales
- Author
-
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Arias Vicente, Marta, Tandonnet, Charles, Gazel-Anthoine, Paul, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Arias Vicente, Marta, Tandonnet, Charles, and Gazel-Anthoine, Paul
- Published
- 2020
18. Creating a model for expected Goals in football using qualitative player information
- Author
-
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Arias Vicente, Marta, Fernández, Javier, Madrero Pardo, Pau, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Arias Vicente, Marta, Fernández, Javier, and Madrero Pardo, Pau
- Abstract
The field of sports analytics has been growing a lot in recent years. Sports like baseball and basketball were among the first to embrace it, but football has also taken big steps in that direction. One of the causes is that data analysis allows for the development of new advanced metrics which can provide a competitive advantage. This project presents a new version of one of these advanced metrics applied to football, the Expected Goals. The metric estimates how likely it is for a shot to end up becoming a goal. We present two different approaches for building the predictors: one that uses player qualitative information and another player agnostic. We then reflect on the importance of the calibration of the probabilities yielded by the models, as well as their possible interpretations, and present some of the applications that can be used to evaluate team and player performance. We also show the impact each feature has on the models to make their outputs interpretable and to demonstrate that the addition of the player qualitative information is important for the performance of the model.
- Published
- 2020
19. Visual Search: finding similar images
- Author
-
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Arias Vicente, Marta, Camli, Gorkem, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Arias Vicente, Marta, and Camli, Gorkem
- Abstract
Visual Search task focuses on finding visually similar images given a query image and returning the results in a ranked order where the most similar images ranked first. The main contributions of this thesis are implementing an end-toend system to perform the visual search that can be used in further research or applications, and conducting experiments on different types of feature extraction and dimensionality reduction methods to understand which ones are more likely to give better search relevance and quality results.
- Published
- 2020
20. Automatic organizing of user travel items
- Author
-
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Arias Vicente, Marta, Cugat, Josep, Torres Bellido, Bernat, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Arias Vicente, Marta, Cugat, Josep, and Torres Bellido, Bernat
- Published
- 2020
21. Roots of Trumpism: Homophily and Social Feedback in Donald Trump Support on Reddit
- Author
-
Universitat Politècnica de Catalunya. Departament de Llenguatges i Sistemes Informàtics, ISI Foundation, Bonchi, Francesco, Monti, Corrado, De Francisci Morales, Gianmarco, Arias Vicente, Marta, Massachs Güell, Joan, Universitat Politècnica de Catalunya. Departament de Llenguatges i Sistemes Informàtics, ISI Foundation, Bonchi, Francesco, Monti, Corrado, De Francisci Morales, Gianmarco, Arias Vicente, Marta, and Massachs Güell, Joan
- Abstract
Estudiem l’emergència del suport a Donald Trump a la discussió política de Reddit. Amb gairebé 800k subscriptors, “r/The_Donald” és una de les comunitats més grans de Reddit i un dels nuclis principals de partidaris de Trump. Es va crear el 2015, poc després que Donald Trump comencés la campanya presidencial. Utilitzant només dades del 2012, prediem la versemblança de ser un partidari de Donald Trump el 2016, l’any de les darreres eleccions presidencials dels EUA. Per caracteritzar el comportament dels simpatitzants de Trump, partim de tres hipòtesis sociològiques diferents: l’homofília, la influència social i la rebuda social. Operacionalitzem cada hipòtesi com un conjunt de característiques per cada usuari i entrenem classificadors per predir-ne la participació en r/The_Donald. Trobem que les característiques basades en l’homofília i la rebuda social són els senyals més predictius. Per contra, no observem un fort impacte dels mecanismes d’influència social. També realitzem una introspecció del model amb més bons resultats per construir una “persona” del típic partidari de Donald Trump a Reddit. Trobem evidències que els trets més prominents inclouen una predominança d’interessos masculins, una inclinació política conservadora i llibertariana i vincles amb contingut políticament incorrecte i conspiratori., Estudiamos la emergencia del soporte a Donald Trump en la discusión política de Reddit. Con casi 800k suscriptores, “r/The_Donald” es una de las comunidades más grandes de Reddit y uno de los núcleos principales de partidarios de Trump. Se creó el 2015, poco después que Donald Trump comenzara la campaña electoral. Utilizando solamente datos del 2012, predecimos la verosimilitud de ser un partidario de Donald Trump el 2016, el año de las últimas elecciones presidenciales de los EEUU. Para caracterizar el comportamiento de los simpatizantes de Trump, partimos de tres hipótesis sociológicas diferentes: la homofilia, la influencia social y el recibimiento social. Operacionalizamos cada hipótesis como un conjunto de características por cada usuario y entrenamos clasificadores para predecir la participación la participación en r/The_Donald. Encontramos que las características basadas en la homofilia y el recibimiento social son los señales más predictivos. En cambio, no observamos un fuerte impacto de los mecanismos de influencia social. También realizamos una introspección del modelo con mejores resultados para construir una “persona” del típico partidario de Donald Trump en Reddit. Encontramos evidencias que los rasgos más prominentes incluyen una predominancia de intereses masculinos, una inclinación política conservadora y libertaria y vínculos con contenido políticamente incorrecto y conspiratorio., We study the emergence of support for Donald Trump in Reddit’s political discussion. With almost 800k subscribers, “r/The Donald” is one of the largest communities on Reddit, and one of the main hubs for Trump supporters. It was created in 2015, shortly after Donald Trump began his presidential campaign. By using only data from 2012, we predict the likelihood of being a supporter of Donald Trump in 2016, the year of the last US presidential elections. To characterize the behavior of Trump supporters, we draw from three different sociological hypotheses: homophily, social influence, and social feedback. We operationalize each hypothesis as a set of features for each user, and train classifiers to predict their participation in r/The Donald. We find that homophily-based and social feedback-based features are the most predictive signals. Conversely, we do not observe a strong impact of social influence mechanisms. We also perform an introspection of the best-performing model to build a “persona” of the typical supporter of Donald Trump on Reddit. We find evidence that the most prominent traits include a predominance of masculine interests, a conservative and libertarian political leaning, and links with politically incorrect and conspiratorial content., Outgoing
- Published
- 2020
22. Characterizing transactional databases for frequent itemset mining
- Author
-
Lezcano Ríos, Christian Gerardo, Arias Vicente, Marta, Universitat Politècnica de Catalunya. Doctorat en Computació, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, and Universitat Politècnica de Catalunya. LARCA - Laboratori d'Algorísmia Relacional, Complexitat i Aprenentatge
- Subjects
Bases de dades ,Databases ,Transactional databases ,Informàtica::Intel·ligència artificial::Aprenentatge automàtic [Àrees temàtiques de la UPC] ,Machine learning ,Aprenentatge automàtic ,Frequent itemset mining ,Mineria de dades ,Data mining ,Data characterization - Abstract
This paper presents a study of the characteristics of transactional databases used in frequent itemset mining. Such characterizations have typically been used to benchmark and understand the data mining algorithms working on these databases. The aim of our study is to give a picture of how diverse and representative these benchmarking databases are, both in general but also in the context of particular empirical studies found in the literature. Our proposed list of metrics contains many of the existing metrics found in the literature, as well as new ones. Our study shows that our list of metrics is able to capture much of the datasets’ inner complexity and thus provides a good basis for the characterization of transactional datasets. Finally, we provide a set of representative datasets based on our characterization that may be used as a benchmark safely. Both authors have been partially supported by TIN2017-89244-R from MINECO (Spain’s Ministerio de Economia, Industria y Competitividad) and the recognition 2017SGR-856 (MACDA) from AGAUR (Generalitat de Catalunya). Christian Lezcano is supported by Paraguay’s Foreign Postgraduate Scholarship Programme Don Carlos Antonio López (BECAL).
- Published
- 2019
23. Challenging the generalization capabilities of Graph Neural Networks for network modeling
- Author
-
Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. CBA - Sistemes de Comunicacions i Arquitectures de Banda Ampla, Universitat Politècnica de Catalunya. LARCA - Laboratori d'Algorísmia Relacional, Complexitat i Aprenentatge, Suárez-Varela Maciá, José Rafael, Carol Bosch, Sergi, Rusek, Krzysztof, Almasan Puscas, Felician Paul, Arias Vicente, Marta, Barlet Ros, Pere, Cabellos Aparicio, Alberto, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. CBA - Sistemes de Comunicacions i Arquitectures de Banda Ampla, Universitat Politècnica de Catalunya. LARCA - Laboratori d'Algorísmia Relacional, Complexitat i Aprenentatge, Suárez-Varela Maciá, José Rafael, Carol Bosch, Sergi, Rusek, Krzysztof, Almasan Puscas, Felician Paul, Arias Vicente, Marta, Barlet Ros, Pere, and Cabellos Aparicio, Alberto
- Abstract
Today, network operators still lack functional network models able to make accurate predictions of end-to-end Key Performance Indicators (e.g., delay or jitter) at limited cost. Recently a novel Graph Neural Network (GNN) model called RouteNet was proposed as a cost-effective alternative to estimate the per-source/destination pair mean delay and jitter in networks. Thanks to its GNN architecture that operates over graph-structured data, RouteNet revealed an unprecedented ability to learn and model the complex relationships among topology, routing and input traffic in networks. As a result, it was able to make performance predictions with similar accuracy than resource-hungry packet-level simulators even in network scenarios unseen during training. In this demo, we will challenge the generalization capabilities of RouteNet with more complex scenarios, including larger topologies., This work was supported by the Spanish MINECO under contract TEC2017-90034-C2-1-R (ALLIANCE), the Catalan Institution for Research and Advanced Studies (ICREA) and the AGH University of Science and Technology grant, under contract no. 15.11.230.400. The research was also supported in part by PL-Grid Infrastructure., Peer Reviewed, Postprint (author's final draft)
- Published
- 2019
24. Graph Neural Networks and its applications
- Author
-
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Arias Vicente, Marta, Rodríguez Esmerats, Pau, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Arias Vicente, Marta, and Rodríguez Esmerats, Pau
- Abstract
This project will explore some of the most prominent Graph Neural Network variants and apply them to two tasks: approximation of the community detection Girvan-Newman algorithm and compiled code snippet classification.
- Published
- 2019
25. A benchmark for graph neural networks for computer network modeling
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Cabellos Aparicio, Alberto, Arias Vicente, Marta, Carol Bosch, Sergi, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Cabellos Aparicio, Alberto, Arias Vicente, Marta, and Carol Bosch, Sergi
- Abstract
Today, network operators still lack functional network models able to make accurate predictions of end-to-end Key Performance Indicators (e.g., delay).This thesis introduces the benchmark for computer network modeling using the RouteNet Graph Neural Network as well as a routing creation algorithm.
- Published
- 2019
26. Optimization of the search engine ElasticSearch
- Author
-
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Orange, Ecole d’Ingénieurs d’Informatique et Système d’Information en Santé, Arias Vicente, Marta, Soto-Romero, Georges, Naamani, Karim, Coviaux, Quentin, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Orange, Ecole d’Ingénieurs d’Informatique et Système d’Information en Santé, Arias Vicente, Marta, Soto-Romero, Georges, Naamani, Karim, and Coviaux, Quentin
- Abstract
This thesis will present the work done in the Search on Demand team at Orange. It will present the optimization of the search engine Elasticsearch, the ways to bring data into it with the mean of an ETL and how relevance can be tuned using Lucene's inverted indices.
- Published
- 2019
27. Synthetic dataset generation with itemset-based generative models
- Author
-
Universitat Politècnica de Catalunya. Doctorat en Computació, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. LARCA - Laboratori d'Algorísmia Relacional, Complexitat i Aprenentatge, Lezcano Ríos, Christian Gerardo, Arias Vicente, Marta, Universitat Politècnica de Catalunya. Doctorat en Computació, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. LARCA - Laboratori d'Algorísmia Relacional, Complexitat i Aprenentatge, Lezcano Ríos, Christian Gerardo, and Arias Vicente, Marta
- Abstract
This paper proposes three different data generators, tailored to transactional datasets, based on existing itemset-based generative models. All these generators are intuitive and easy to implement and show satisfactory performance. The quality of each generator is assessed by means of three different methods that capture how well the original dataset structure is preserved., Both authors have been partially supported by TIN2017-89244-R from MINECO (Spain’s Ministerio de Economia, Industria y Competitividad) and the recognition 2017SGR-856 (MACDA) from AGAUR (Generalitat de Catalunya). Christian Lezcano is supported by Paraguay’s Foreign Postgraduate Scholarship Programme Don Carlos Antonio López (BECAL)., Peer Reviewed, Postprint (author's final draft)
- Published
- 2019
28. Characterizing transactional databases for frequent itemset mining
- Author
-
Universitat Politècnica de Catalunya. Doctorat en Computació, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. LARCA - Laboratori d'Algorísmia Relacional, Complexitat i Aprenentatge, Lezcano Ríos, Christian Gerardo, Arias Vicente, Marta, Universitat Politècnica de Catalunya. Doctorat en Computació, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. LARCA - Laboratori d'Algorísmia Relacional, Complexitat i Aprenentatge, Lezcano Ríos, Christian Gerardo, and Arias Vicente, Marta
- Abstract
This paper presents a study of the characteristics of transactional databases used in frequent itemset mining. Such characterizations have typically been used to benchmark and understand the data mining algorithms working on these databases. The aim of our study is to give a picture of how diverse and representative these benchmarking databases are, both in general but also in the context of particular empirical studies found in the literature. Our proposed list of metrics contains many of the existing metrics found in the literature, as well as new ones. Our study shows that our list of metrics is able to capture much of the datasets’ inner complexity and thus provides a good basis for the characterization of transactional datasets. Finally, we provide a set of representative datasets based on our characterization that may be used as a benchmark safely., Both authors have been partially supported by TIN2017-89244-R from MINECO (Spain’s Ministerio de Economia, Industria y Competitividad) and the recognition 2017SGR-856 (MACDA) from AGAUR (Generalitat de Catalunya). Christian Lezcano is supported by Paraguay’s Foreign Postgraduate Scholarship Programme Don Carlos Antonio López (BECAL)., Peer Reviewed, Postprint (published version)
- Published
- 2019
29. Implementació de la funcionalitat offline d'un sistema Point-Of-Sale
- Author
-
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Arias Vicente, Marta, Alonso Bohigas, Gerard, Solís Gilabert, Roger, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Arias Vicente, Marta, Alonso Bohigas, Gerard, and Solís Gilabert, Roger
- Published
- 2019
30. Analysis on distance metrics approaches in graphs and their applications
- Author
-
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Arias Vicente, Marta, Arratia Quesada, Argimiro Alejandro, Cebollero Ruiz, Laura, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Arias Vicente, Marta, Arratia Quesada, Argimiro Alejandro, and Cebollero Ruiz, Laura
- Published
- 2019
31. Plataforma NIRS PAT para la industria 4.0 con Tensorflow
- Author
-
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, IRIS, Arias Vicente, Marta, Rosales Lavielle, Alejandro Alberto, Chamizo Álvarez, Víctor, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, IRIS, Arias Vicente, Marta, Rosales Lavielle, Alejandro Alberto, and Chamizo Álvarez, Víctor
- Abstract
Utilizando la plataforma para deep learning Tensorflow, se han obtenido modelos de clasificación y cuantificación con datos espectroscópicos. Estos se han comparado con los métodos analíticos tradicionales de la quimiometría, para ver si son la opción a seguir en la Industria 4.0.
- Published
- 2019
32. Learning complex games through self play - Pokémon battles
- Author
-
Cabellos Aparicio, Alberto, Arias Vicente, Marta, Giró Nieto, Xavier, Llobet Sanchez, Miquel, Cabellos Aparicio, Alberto, Arias Vicente, Marta, Giró Nieto, Xavier, and Llobet Sanchez, Miquel
- Abstract
En aquest projecte s'analitza la viabilitat d'utilitzar aprenentatge per reforç i "self- play" per entrenar un agent a jugar Batalles Pokémon. El joc és analitzat en detall i les seves propietats úniques són revelades. El projecte analitza diverses plataformes d'aprenentatge per reforç., In this project we analyze the feasibility of using reinforcement learning and self-play to train an agent playing Pokémon Battles. The game is analyzed in depth and it's unique properties and challenges revealed. The project surveys different reinforcement learning libraries.
- Published
- 2018
33. Automated construction and analysis of political networks via open government and media sources
- Author
-
García-Olano, Diego, Arias Vicente, Marta|||0000-0001-7359-1815, Larriba Pey, Josep|||0000-0002-7070-9256, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. LARCA - Laboratori d'Algorísmia Relacional, Complexitat i Aprenentatge, and Universitat Politècnica de Catalunya. DAMA-UPC - Data Management Group
- Subjects
ComputingMilieux_THECOMPUTINGPROFESSION ,Text mining ,Informàtica::Sistemes d'informació [Àrees temàtiques de la UPC] ,Network science ,Open data ,Mineria de dades ,Data mining ,Political science ,Ciències polítiques - Abstract
We present a tool to generate real world political networks from user provided lists of politicians and news sites. Additional output includes visualizations, interactive tools and maps that allow a user to better understand the politicians and their surrounding environments as portrayed by the media. As a case study, we construct a comprehensive list of current Texas politicians, select news sites that convey a spectrum of political viewpoints covering Texas politics, and examine the results. We propose a ”Combined” co-occurrence distance metric to better reflect the relationship between two entities. A topic modeling technique is also proposed as a novel, automated way of labeling communities that exist within a politician’s ”extended” network.
- Published
- 2016
34. Semblant cerca semblant?: la formació de grups de treball en la pràctica de la programació
- Author
-
Sanou Gozalo, Eduard, Arias Vicente, Marta, Ferrer Cancho, Ramon, Hernández Fernández, Antonio, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. Institut de Ciències de l'Educació, and Universitat Politècnica de Catalunya. LARCA - Laboratori d'Algorísmia Relacional, Complexitat i Aprenentatge
- Subjects
Ensenyament de la programació ,Grau d'informàtica ,Electronic data processing -- Study and teaching (Higher) ,Ensenyament i aprenentatge::Ensenyament universitari [Àrees temàtiques de la UPC] ,Programació per parelles ,Informàtica -- Ensenyament universitari -- Problemes, exercicis, etc ,Formació d'equips ,Treball en parelles ,Neuroeducació ,Ensenyament i aprenentatge::Metodologies docents [Àrees temàtiques de la UPC] - Abstract
En una assignatura del grau d'enginyeria d'informàtica, la pràctica de programació ha passat de ser un treball individual a un treball en equip, en principi per parelles. L'alumnat té llibertat total per formar equips amb una intervenció mínima per part del professorat. L'anàlisi de les parelles formades indica que no hi ha una tendència dels alumnes a associar-se amb alumnes de rendiment semblant, potser perquè paràmetres cognitius generals no regeixen la tria de parella acadèmica. In a course of the degree of computer science, the programming project has changed from individual to teamed work, tentatively in couples (pair programming). Students have full freedom to team up with minimum intervention from professors. The analysis of the couples made indicates that students do not tend associate with students with a similar academic performance, maybe because general cognitive parameters do not govern the choice of academic partners.
- Published
- 2016
35. Identifiability and transportability in dynamic causal networks
- Author
-
Blondel, Gilles, Arias Vicente, Marta, Gavaldà Mestre, Ricard, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, and Universitat Politècnica de Catalunya. LARCA - Laboratori d'Algorísmia Relacional, Complexitat i Aprenentatge
- Subjects
Graph theory ,Belief networks ,Dynamic Bayesian networks ,Informàtica::Informàtica teòrica [Àrees temàtiques de la UPC] ,Standard causal graphs ,Grafs, Teoria de ,Dynamic causal networks - Abstract
In this paper we propose a causal analog to the purely observational Dynamic Bayesian Networks, which we call Dynamic Causal Networks. We provide a sound and complete algorithm for identification of Dynamic Causal Networks, namely, for computing the effect of an intervention or experiment, based on passive observations only, whenever possible. We note the existence of two types of confounder variables that affect in substantially different ways the identification procedures, a distinction with no analog in either Dynamic Bayesian Networks or standard causal graphs. We further propose a procedure for the transportability of causal effects in Dynamic Causal Network settings, where the result of causal experiments in a source domain may be used for the identification of causal effects in a target domain.
- Published
- 2016
36. Does like seek like?: the formation of working groups in a programming project
- Author
-
Sanou Gozalo, Eduard, Hernández Fernández, Antonio, Arias Vicente, Marta, Ferrer Cancho, Ramon, Sanou Gozalo, Eduard, Hernández Fernández, Antonio, Arias Vicente, Marta, and Ferrer Cancho, Ramon
- Abstract
In a course of the degree of computer science, the programming project has changed from individual to teamed work, tentatively in couples (pair programming). Students have full freedom to team up with minimum intervention from teachers. The analysis of the couples made indicates that students do not tend to associate with students with a similar academic performance, maybe because general cognitive parameters do not govern the choice of academic partners. Pair programming seems to give great results, so the efforts of future research in this field should focus precisely on how these pairs are formed, underpinning the mechanisms of human social interactions, Peer Reviewed
- Published
- 2017
37. Learning definite Horn formulas from closure queries
- Author
-
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. LARCA - Laboratori d'Algorísmia Relacional, Complexitat i Aprenentatge, Arias Vicente, Marta, Balcázar Navarro, José Luis, Tîrnauca, Cristina, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. LARCA - Laboratori d'Algorísmia Relacional, Complexitat i Aprenentatge, Arias Vicente, Marta, Balcázar Navarro, José Luis, and Tîrnauca, Cristina
- Abstract
A definite Horn theory is a set of n-dimensional Boolean vectors whose characteristic function is expressible as a definite Horn formula, that is, as conjunction of definite Horn clauses. The class of definite Horn theories is known to be learnable under different query learning settings, such as learning from membership and equivalence queries or learning from entailment. We propose yet a different type of query: the closure query. Closure queries are a natural extension of membership queries and also a variant, appropriate in the context of definite Horn formulas, of the so-called correction queries. We present an algorithm that learns conjunctions of definite Horn clauses in polynomial time, using closure and equivalence queries, and show how it relates to the canonical Guigues–Duquenne basis for implicational systems. We also show how the different query models mentioned relate to each other by either showing full-fledged reductions by means of query simulation (where possible), or by showing their connections in the context of particular algorithms that use them for learning definite Horn formulas., Peer Reviewed, Postprint (author's final draft)
- Published
- 2017
38. Does like seek like? The formation of working groups in a programming project
- Author
-
Universitat Politècnica de Catalunya. Institut de Ciències de l'Educació, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. LARCA - Laboratori d'Algorísmia Relacional, Complexitat i Aprenentatge, Sanou Gozalo, Eduard, Hernández Fernández, Antonio, Arias Vicente, Marta, Ferrer Cancho, Ramon, Universitat Politècnica de Catalunya. Institut de Ciències de l'Educació, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. LARCA - Laboratori d'Algorísmia Relacional, Complexitat i Aprenentatge, Sanou Gozalo, Eduard, Hernández Fernández, Antonio, Arias Vicente, Marta, and Ferrer Cancho, Ramon
- Abstract
In a course of the degree of computer science, the programming project has changed from individual to teamed work, tentatively in couples (pair programming). Students have full freedom to team up with minimum intervention from teachers. The analysis of the working groups made indicates that students do not tend to associate with students with a similar academic performance, perhaps because general cognitive parameters do not drive the choice of academic partners. Pair programming seems to give great results, so the efforts of future research in this field should focus precisely on how these pairs are formed, underpinning the mechanisms of human social interactions., Peer Reviewed, Postprint (published version)
- Published
- 2017
39. Classifier selection with permutation tests
- Author
-
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. LARCA - Laboratori d'Algorísmia Relacional, Complexitat i Aprenentatge, Arias Vicente, Marta, Arratia Quesada, Argimiro Alejandro, Duarte López, Ariel, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. LARCA - Laboratori d'Algorísmia Relacional, Complexitat i Aprenentatge, Arias Vicente, Marta, Arratia Quesada, Argimiro Alejandro, and Duarte López, Ariel
- Abstract
This work presents a content-based recommender system for machine learning classifier algorithms. Given a new data set, a recommendation of what classifier is likely to perform best is made based on classifier performance over similar known data sets. This similarity is measured according to a data set characterization that includes several state-of-the-art metrics taking into account physical structure, statistics, and information theory. A novelty with respect to prior work is the use of a robust approach based on permutation tests to directly assess whether a given learning algorithm is able to exploit the attributes in a data set to predict class labels, and compare it to the more commonly used F-score metric for evaluating classifier performance. To evaluate our approach, we have conducted an extensive experimentation including 8 of the main machine learning classification methods with varying configurations and 65 binary data sets, leading to over 2331 experiments. Our results show that using the information from the permutation test clearly improves the quality of the recommendations., Peer Reviewed, Postprint (author's final draft)
- Published
- 2017
40. Identifiability and transportability in dynamic causal networks
- Author
-
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. LARCA - Laboratori d'Algorísmia Relacional, Complexitat i Aprenentatge, Blondel, Gilles, Arias Vicente, Marta, Gavaldà Mestre, Ricard, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. LARCA - Laboratori d'Algorísmia Relacional, Complexitat i Aprenentatge, Blondel, Gilles, Arias Vicente, Marta, and Gavaldà Mestre, Ricard
- Abstract
In this paper, we propose a causal analog to the purely observational dynamic Bayesian networks, which we call dynamic causal networks. We provide a sound and complete algorithm for the identification of causal effects in dynamic causal networks, namely for computing the effect of an intervention or experiment given a dynamic causal network and probability distributions of passive observations of its variables, whenever possible. We note the existence of two types of hidden confounder variables that affect in substantially different ways the identification procedures, a distinction with no analog in either dynamic Bayesian networks or standard causal graphs. We further propose a procedure for the transportability of causal effects in dynamic causal network settings, where the result of causal experiments in a source domain may be used for the identification of causal effects in a target domain., Peer Reviewed, Postprint (author's final draft)
- Published
- 2017
41. Sentiment analysis on Twitter
- Author
-
ServiZurich, Arias Vicente, Marta, Balcázar Navarro, José Luis, Tolos Rigueiro, Marta, Proscia, Rocco, ServiZurich, Arias Vicente, Marta, Balcázar Navarro, José Luis, Tolos Rigueiro, Marta, and Proscia, Rocco
- Abstract
In recent years more and more people have been connecting with Social Networks. One of the most used is Twitter. This huge amount of information is attracting the interest of companies. One reason is that this huge source of information can be used to detect public opinion about their brands and thus improve their business values. In order to transform the information present in the Social Networks into knowledge several steps are required. This project aim to describe them and provide tools that are able to perform this task. The first problem is how to retrieve the data. Several ways are available, each one with its own pros and cons. After that it is necessary to study and define proper queries in order to retrieve the information needed. Once the data is retrieved you may need to filter and explore your data. For this task a Topic Model Algorithm ( LDA ) has been studied and analyzed. LDA has shown positive results when it is tuned in the proper way and it is combined with appropriate visualization techniques. The difference between a Topic Model Algorithm and other Clustering/Segmentation techniques is that Topic Models allows each ”document” ( instance ) to belong to more than one topic ( cluster ). LDA doesn’t natively work well on Twitter due to the very short length of the tweets. An investigation in the literature has revealed a solution to this problem. Another problem that is common in clustering is how to validate the Algorithm and how to choose the proper number of topics ( clusters), for this problem several metrics in the literature have been explored. Afterwards, Sentiment Analysis techniques can be applied in order to measure the opinion of the users . The literature presents several approaches and ways to solving this problem. This work is focused in solving the Polarity Detection task, with three classes , so, classify if a tweet express a positive , a negative or a neutral sentiment. Here reach accurate results can be challenging, due to the mes
- Published
- 2017
42. Does training affect match performance? A study using data mining and tracking devices
- Author
-
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. LARCA - Laboratori d'Algorísmia Relacional, Complexitat i Aprenentatge, Fernández, Javier, Medina Leal, Daniel, Gómez, Antonio, Arias Vicente, Marta, Gavaldà Mestre, Ricard, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. LARCA - Laboratori d'Algorísmia Relacional, Complexitat i Aprenentatge, Fernández, Javier, Medina Leal, Daniel, Gómez, Antonio, Arias Vicente, Marta, and Gavaldà Mestre, Ricard
- Abstract
FIFA has recently allowed the use of electronic performance and tracking systems (EPTS) in professional football competition, providing teams with novel and more accurate data. Physical performance has not yet taken much attention from the research community, due to the difficulty of accessing this information with the same devices during training and competition. This study provides a methodology based on machine learning and statistical methods to relate the physical performance variation of players during time-framed training sessions, and their performance in the following matches. The analysis is carried out over F.C. Barcelona B, season 2015-2016 data, and makes emphasis on exploiting the design characteristics of the structured training methodology implemented within the club. The use of summarized physical variation data has provided a remarkable relation between higher magnitudes of variation in 3-week time frames during training, and higher physical values in the following matches. With increased data availability this and new approaches could provide a new frontier in physical performance analysis. This is, up to our knowledge, the first study to relate training and matches performance through the same EPTS devices in professional football., Peer Reviewed, Postprint (published version)
- Published
- 2016
43. From training to match performance: A predictive and explanatory study on novel tracking data
- Author
-
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. LARCA - Laboratori d'Algorísmia Relacional, Complexitat i Aprenentatge, Fernández, Javier, Medina, Daniel, Gómez, Antonio, Arias Vicente, Marta, Gavaldà Mestre, Ricard, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. LARCA - Laboratori d'Algorísmia Relacional, Complexitat i Aprenentatge, Fernández, Javier, Medina, Daniel, Gómez, Antonio, Arias Vicente, Marta, and Gavaldà Mestre, Ricard
- Abstract
The recent FIFA approval of the use of Electronic Performance and Tracking Systems (EPTS) during competition, has provided the availability of novel data regarding physical player performance. The analysis of this kind of information will provide teams with competitive advantages, by gaining a deeper understanding of the relation between training and match load, and individual player's fitness characteristics. In order to make sense of this physical data, which is inherently complex, machine learning algorithms that exploit both non-linear and linear relations among variables could be of great aid on building predictive and explanatory models. Also, the increasing availability of information brings the necessity and the challenge for successful interpretation of these models in order to be able to translate the findings into information that can be quickly applied by fast-paced practitioners, such as physical coaches. For season 2015-2016 F. C. Barcelona has collected both physical information from both training sessions and matches using EPTS devices. This study focuses primarily on evaluating up to what extent is possible to predict match performance from training and match physical information. Different machine learning algorithms are applied for building predictive regression models, in combination with feature selection techniques and Principal Component Analysis (PCA) for dimensionality reduction. Physical Variables are segmented into three groups: Locomotor, Metabolic and Mechanical variables, reaching successful prediction rates in 11 out of 17 total variables, based on a threshold determined by expert physical coaches. A normalized root mean square error metric is proposed that allows better understanding of results for practitioners. The second part of this study is focused on understanding the predictor variables that better explain each of the 17 analyzed match variables. It was found that specific variables can act as representatives of the set of high, Peer Reviewed, Postprint (published version)
- Published
- 2016
44. Geospatial search engine
- Author
-
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Arias Vicente, Marta, Poyato, Ricard, Sendra Garcia, Pol, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Arias Vicente, Marta, Poyato, Ricard, and Sendra Garcia, Pol
- Published
- 2016
45. Graph and matrix algorithms for visualizing high dimensional data
- Author
-
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Gavaldà Mestre, Ricard, Arias Vicente, Marta, Shankaranarayanan Venkataraman, Abhinav, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Gavaldà Mestre, Ricard, Arias Vicente, Marta, and Shankaranarayanan Venkataraman, Abhinav
- Abstract
Motivated by the problem of understanding data from the medical domain, we consider algorithms for visually representing highly dimensional data so that "similar" entities appear close together. We will study, implement and compare several algorithms based on graph and on matrix representation
- Published
- 2016
46. GeoSRS: a hybrid social recommender system for geolocated data
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Universitat Politècnica de Catalunya. LARCA - Laboratori d'Algorísmia Relacional, Complexitat i Aprenentatge, Capdevila Pujol, Joan, Arias Vicente, Marta, Arratia Quesada, Argimiro Alejandro, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Universitat Politècnica de Catalunya. LARCA - Laboratori d'Algorísmia Relacional, Complexitat i Aprenentatge, Capdevila Pujol, Joan, Arias Vicente, Marta, and Arratia Quesada, Argimiro Alejandro
- Abstract
All right sreserved. We present GeoSRS, a hybrid recommender system for a popular location-based social network (LBSN), in which users are able to write short reviews on the places of interest they visit. Using state-of-the-art text mining techniques, our system recommends locations to users using as source the whole set of text reviews in addition to their geographical location. To evaluate our system, we have collected our own data sets by crawling the social network Foursquare. To do this efficiently, we propose the use of a parallel version of the Quadtree technique, which may be applicable to crawling/exploring other spatially distributed sources. Finally, we study the performance of GeoSRS on our collected data set and conclude that by combining sentiment analysis and text modeling, GeoSRS generates more accurate recommendations. The performance of the system improves as more reviews are available, which further motivates the use of large-scale crawling techniques such as the Quadtree., Preprint
- Published
- 2016
47. Prototipo de clustering orientado motor de búsqueda
- Author
-
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Sogeti High Tech, Arias Vicente, Marta, Blanco-Hermida Sanz, Eric-Joel, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Sogeti High Tech, Arias Vicente, Marta, and Blanco-Hermida Sanz, Eric-Joel
- Abstract
Proyecto realizado en empresa.Estudio sobre las posibilidades de algoritmos de clustering y aprendizaje automático aplicadas a la red social Twitter. Distinguir tweets que hablan de la empresa Orange de los que no. Hacer análisis de sentimiento y clustering a los tweets para extraer información., Project realized at an enterprise. Study on clustering and machine learning algorithms applied to Twitter. Using supervised learning algorithms, be able to tell apart tweets that talk about Orange that the ones who don't. Dentiment analysis on the tweets to see whether they are talking positively o
- Published
- 2016
48. From training to match performance: an exploratory and predictive analysis on F.C. Barcelona GPS data
- Author
-
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Arias Vicente, Marta, Gavaldà Mestre, Ricard, Fernández, Javier, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Arias Vicente, Marta, Gavaldà Mestre, Ricard, and Fernández, Javier
- Abstract
An exploratory and predictive analysis on GPS data is presented. Physical performance variables from professional football players are analysed in a holistic approach that involves data exploration, analysis of adaptation through clustering, and predictive models for estimating future performance.
- Published
- 2016
49. Recommender system as viral lever
- Author
-
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Arias Vicente, Marta, Valverde Arredondo, Fernando, Megias Duran, Iván, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Arias Vicente, Marta, Valverde Arredondo, Fernando, and Megias Duran, Iván
- Published
- 2016
50. Automated construction and analysis of political networks via open government and media sources
- Author
-
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. LARCA - Laboratori d'Algorísmia Relacional, Complexitat i Aprenentatge, Universitat Politècnica de Catalunya. DAMA-UPC - Data Management Group, García-Olano, Diego, Arias Vicente, Marta, Larriba Pey, Josep, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. LARCA - Laboratori d'Algorísmia Relacional, Complexitat i Aprenentatge, Universitat Politècnica de Catalunya. DAMA-UPC - Data Management Group, García-Olano, Diego, Arias Vicente, Marta, and Larriba Pey, Josep
- Abstract
We present a tool to generate real world political networks from user provided lists of politicians and news sites. Additional output includes visualizations, interactive tools and maps that allow a user to better understand the politicians and their surrounding environments as portrayed by the media. As a case study, we construct a comprehensive list of current Texas politicians, select news sites that convey a spectrum of political viewpoints covering Texas politics, and examine the results. We propose a ”Combined” co-occurrence distance metric to better reflect the relationship between two entities. A topic modeling technique is also proposed as a novel, automated way of labeling communities that exist within a politician’s ”extended” network., Peer Reviewed, Postprint (author's final draft)
- Published
- 2016
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.