Author: "Smyth, Barry" / Database: arXiv - Searchworks@Jio Institute Digital Library Search Results

1. An Analysis of the Impact of Gold Open Access Publications in Computer Science

Author: Cunningham, Padraig and Smyth, Barry
Subjects: Computer Science - Digital Libraries
Abstract: There has been some concern about the impact of predatory publishers on scientific research for some time. Recently, publishers that might previously have been considered `predatory' have established their bona fides, at least to the extent that they are included in citation impact scores such as the field-weighted citation impact (FWCI). These are sometimes called `grey' publishers (MDPI, Frontiers, Hindawi). In this paper, we show that the citation landscape for these grey publications is significantly different from the mainstream landscape and that affording publications in these venues the same status as publications in mainstream journals may significantly distort metrics such as the FWCI., Comment: 12 pages, 8 figures
Published: 2024

2. Contrastive Learning of Asset Embeddings from Financial Time Series

Author: Dolphin, Rian, Smyth, Barry, and Dong, Ruihai
Subjects: Computer Science - Machine Learning, Quantitative Finance - Statistical Finance
Abstract: Representation learning has emerged as a powerful paradigm for extracting valuable latent features from complex, high-dimensional data. In financial domains, learning informative representations for assets can be used for tasks like sector classification, and risk management. However, the complex and stochastic nature of financial markets poses unique challenges. We propose a novel contrastive learning framework to generate asset embeddings from financial time series data. Our approach leverages the similarity of asset returns over many subwindows to generate informative positive and negative samples, using a statistical sampling strategy based on hypothesis testing to address the noisy nature of financial data. We explore various contrastive loss functions that capture the relationships between assets in different ways to learn a discriminative representation space. Experiments on real-world datasets demonstrate the effectiveness of the learned asset embeddings on benchmark industry classification and portfolio optimization tasks. In each case our novel approaches significantly outperform existing baselines highlighting the potential for contrastive learning to capture meaningful and actionable relationships in financial data., Comment: 9 pages, 4 figures, 4 tables
Published: 2024

3. Facilitating Interdisciplinary Knowledge Transfer with Research Paper Recommender Systems

Author: Cunningham, Eoghan, Greene, Derek, and Smyth, Barry
Subjects: Computer Science - Information Retrieval, Computer Science - Digital Libraries
Abstract: In the extensive recommender systems literature, novelty and diversity have been identified as key properties of useful recommendations. However, these properties have received limited attention in the specific sub-field of research paper recommender systems. In this work, we argue for the importance of offering novel and diverse research paper recommendations to scientists. This approach aims to reduce siloed reading, break down filter bubbles, and promote interdisciplinary research. We propose a novel framework for evaluating the novelty and diversity of research paper recommendations that leverages methods from network analysis and natural language processing. Using this framework, we show that the choice of representational method within a larger research paper recommendation system can have a measurable impact on the nature of downstream recommendations, specifically on their novelty and diversity. We highlight a novel paper embedding method, which we demonstrate offers more innovative and diverse recommendations without sacrificing precision, compared to other state-of-the-art baselines., Comment: Under Review at QSS
Published: 2023

4. Industry Classification Using a Novel Financial Time-Series Case Representation

Author: Dolphin, Rian, Smyth, Barry, and Dong, Ruihai
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Quantitative Finance - Statistical Finance
Abstract: The financial domain has proven to be a fertile source of challenging machine learning problems across a variety of tasks including prediction, clustering, and classification. Researchers can access an abundance of time-series data and even modest performance improvements can be translated into significant additional value. In this work, we consider the use of case-based reasoning for an important task in this domain, by using historical stock returns time-series data for industry sector classification. We discuss why time-series data can present some significant representational challenges for conventional case-based reasoning approaches, and in response, we propose a novel representation based on stock returns embeddings, which can be readily calculated from raw stock returns data. We argue that this representation is well suited to case-based reasoning and evaluate our approach using a large-scale public dataset for the industry sector classification task, demonstrating substantial performance improvements over several baselines using more conventional representations., Comment: 15 pages
Published: 2023

5. Item Graph Convolution Collaborative Filtering for Inductive Recommendations

Author: D'Amico, Edoardo, Muhammad, Khalil, Tragos, Elias, Smyth, Barry, Hurley, Neil, and Lawlor, Aonghus
Subjects: Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: Graph Convolutional Networks (GCN) have been recently employed as core component in the construction of recommender system algorithms, interpreting user-item interactions as the edges of a bipartite graph. However, in the absence of side information, the majority of existing models adopt an approach of randomly initialising the user embeddings and optimising them throughout the training process. This strategy makes these algorithms inherently transductive, curtailing their ability to generate predictions for users that were unseen at training time. To address this issue, we propose a convolution-based algorithm, which is inductive from the user perspective, while at the same time, depending only on implicit user-item interaction data. We propose the construction of an item-item graph through a weighted projection of the bipartite interaction network and to employ convolution to inject higher order associations into item embeddings, while constructing user representations as weighted sums of the items with which they have interacted. Despite not training individual embeddings for each user our approach achieves state of-the-art recommendation performance with respect to transductive baselines on four real-world datasets, showing at the same time robust inductive performance.
Published: 2023
Full Text: View/download PDF

6. A Multimodal Embedding-Based Approach to Industry Classification in Financial Markets

Author: Dolphin, Rian, Smyth, Barry, and Dong, Ruihai
Subjects: Quantitative Finance - Statistical Finance
Abstract: Industry classification schemes provide a taxonomy for segmenting companies based on their business activities. They are relied upon in industry and academia as an integral component of many types of financial and economic analysis. However, even modern classification schemes have failed to embrace the era of big data and remain a largely subjective undertaking prone to inconsistency and misclassification. To address this, we propose a multimodal neural model for training company embeddings, which harnesses the dynamics of both historical pricing data and financial news to learn objective company representations that capture nuanced relationships. We explain our approach in detail and highlight the utility of the embeddings through several case studies and application to the downstream task of industry classification., Comment: 8 pages. Accepted at AICS 2022 under title "A Machine Learning Approach to Industry Classification in Financial Markets". Preliminary version under this title was discussed at ICAIF '22 Workshop on NLP and Network Analysis in Financial Applications. arXiv admin note: text overlap with arXiv:2202.08968
Published: 2022

7. Author Multidisciplinarity and Disciplinary Roles in Field of Study Networks

Author: Cunningham, Eoghan, Smyth, Barry, and Greene, Derek
Subjects: Computer Science - Digital Libraries, Computer Science - Social and Information Networks
Abstract: When studying large research corpora, "distant reading" methods are vital to understand the topics and trends in the corresponding research space. In particular, given the recognised benefits of multidisciplinary research, it may be important to map schools or communities of diverse research topics, and to understand the multidisciplinary role that topics play within and between these communities. This work proposes Field of Study (FoS) networks as a novel network representation for use in scientometric analysis. We describe the formation of FoS networks, which relate research topics according to the authors who publish in them, from corpora of articles in which fields of study can be identified. FoS networks are particularly useful for the distant reading of large datasets of research papers when analysed through the lens of exploring multidisciplinary science. In an evolving scientific landscape, modular communities in FoS networks offer an alternative categorisation strategy for research topics and sub-disciplines, when compared to traditional prescribed discipline classification schemes. Furthermore, structural role analysis of FoS networks can highlight important characteristics of topics in such communities. To support this, we present two case studies which explore multidisciplinary research in corpora of varying size and scope; namely, 6,323 articles relating to network science research and 4,184,011 articles relating to research on the COVID-19-pandemic.
Published: 2022

8. Stock Embeddings: Learning Distributed Representations for Financial Assets

Author: Dolphin, Rian, Smyth, Barry, and Dong, Ruihai
Subjects: Quantitative Finance - Statistical Finance, Computer Science - Machine Learning, Quantitative Finance - Computational Finance
Abstract: Identifying meaningful relationships between the price movements of financial assets is a challenging but important problem in a variety of financial applications. However with recent research, particularly those using machine learning and deep learning techniques, focused mostly on price forecasting, the literature investigating the modelling of asset correlations has lagged somewhat. To address this, inspired by recent successes in natural language processing, we propose a neural model for training stock embeddings, which harnesses the dynamics of historical returns data in order to learn the nuanced relationships that exist between financial assets. We describe our approach in detail and discuss a number of ways that it can be used in the financial domain. Furthermore, we present the evaluation results to demonstrate the utility of this approach, compared to several important benchmarks, in two real-world financial analytics tasks., Comment: Currently under review. 9 pages, 4 figures
Published: 2022

9. NumHTML: Numeric-Oriented Hierarchical Transformer Model for Multi-task Financial Forecasting

Author: Yang, Linyi, Li, Jiazheng, Dong, Ruihai, Zhang, Yue, and Smyth, Barry
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Economics - Econometrics
Abstract: Financial forecasting has been an important and active area of machine learning research because of the challenges it presents and the potential rewards that even minor improvements in prediction accuracy or forecasting may entail. Traditionally, financial forecasting has heavily relied on quantitative indicators and metrics derived from structured financial statements. Earnings conference call data, including text and audio, is an important source of unstructured data that has been used for various prediction tasks using deep earning and related approaches. However, current deep learning-based methods are limited in the way that they deal with numeric data; numbers are typically treated as plain-text tokens without taking advantage of their underlying numeric structure. This paper describes a numeric-oriented hierarchical transformer model to predict stock returns, and financial risk using multi-modal aligned earnings calls data by taking advantage of the different categories of numbers (monetary, temporal, percentages etc.) and their magnitude. We present the results of a comprehensive evaluation of NumHTML against several state-of-the-art baselines using a real-world publicly available dataset. The results indicate that NumHTML significantly outperforms the current state-of-the-art across a variety of evaluation metrics and that it has the potential to offer significant financial gains in a practical trading context., Comment: Accepted to AAAI-22
Published: 2022

10. Investigating Health-Aware Smart-Nudging with Machine Learning to Help People Pursue Healthier Eating-Habits

Author: Khan, Mansura A, Muhammad, Khalil, Smyth, Barry, and Coyle, David
Subjects: Computer Science - Human-Computer Interaction, Computer Science - Artificial Intelligence, Computer Science - Information Retrieval, Computer Science - Machine Learning, 68U35 (Primary), 68T35 (Secondary), 68T50(Secondary), I.2.1
Abstract: Food-choices and eating-habits directly contribute to our long-term health. This makes the food recommender system a potential tool to address the global crisis of obesity and malnutrition. Over the past decade, artificial-intelligence and medical researchers became more invested in researching tools that can guide and help people make healthy and thoughtful decisions around food and diet. In many typical (Recommender System) RS domains, smart nudges have been proven effective in shaping users' consumption patterns. In recent years, knowledgeable nudging and incentifying choices started getting attention in the food domain as well. To develop smart nudging for promoting healthier food choices, we combined Machine Learning and RS technology with food-healthiness guidelines from recognized health organizations, such as the World Health Organization, Food Standards Agency, and the National Health Service United Kingdom. In this paper, we discuss our research on, persuasive visualization for making users aware of the healthiness of the recommended recipes. Here, we propose three novel nudging technology, the WHO-BubbleSlider, the FSA-ColorCoading, and the DRCI-MLCP, that encourage users to choose healthier recipes. We also propose a Topic Modeling based portion-size recommendation algorithm. To evaluate our proposed smart-nudges, we conducted an online user study with 96 participants and 92250 recipes. Results showed that, during the food decision-making process, appropriate healthiness cues make users more likely to click, browse, and choose healthier recipes over less healthy ones.
Published: 2021

11. Collaboration in the Time of COVID: A Scientometric Analysis of Multidisciplinary SARS-CoV-2 Research

Author: Cunningham, Eoghan, Smyth, Barry, and Greene, Derek
Subjects: Computer Science - Digital Libraries, Computer Science - Social and Information Networks
Abstract: The novel coronavirus SARS-CoV-2 and the COVID-19 illness it causes have inspired unprecedented levels of multidisciplinary research in an effort to address a generational public health challenge. In this work we conduct a scientometric analysis of COVID-19 research, paying particular attention to the nature of collaboration that this pandemic has fostered among different disciplines. Increased multidisciplinary collaboration has been shown to produce greater scientific impact, albeit with higher co-ordination costs. As such, we consider a collection of over 166,000 COVID-19-related articles to assess the scale and diversity of collaboration in COVID-19 research, which we compare to non-COVID-19 controls before and during the pandemic. We show that COVID-19 research teams are not only significantly smaller than their non-COVID-19 counterparts, but they are also more diverse. Furthermore, we find that COVID-19 research has increased the multidisciplinarity of authors across most scientific fields of study, indicating that COVID-19 has helped to remove some of the barriers that usually exist between disparate disciplines. Finally, we highlight a number of interesting areas of multidisciplinary research during COVID-19, and propose methodologies for visualising the nature of multidisciplinary collaboration, which may have application beyond this pandemic., Comment: Submitted to Humanities and Social Sciences Communications: accepted pending minor revisions
Published: 2021
Full Text: View/download PDF

12. Measuring Financial Time Series Similarity With a View to Identifying Profitable Stock Market Opportunities

Author: Dolphin, Rian, Smyth, Barry, Xu, Yang, and Dong, Ruihai
Subjects: Quantitative Finance - Statistical Finance, Computer Science - Machine Learning
Abstract: Forecasting stock returns is a challenging problem due to the highly stochastic nature of the market and the vast array of factors and events that can influence trading volume and prices. Nevertheless it has proven to be an attractive target for machine learning research because of the potential for even modest levels of prediction accuracy to deliver significant benefits. In this paper, we describe a case-based reasoning approach to predicting stock market returns using only historical pricing data. We argue that one of the impediments for case-based stock prediction has been the lack of a suitable similarity metric when it comes to identifying similar pricing histories as the basis for a future prediction -- traditional Euclidean and correlation based approaches are not effective for a variety of reasons -- and in this regard, a key contribution of this work is the development of a novel similarity metric for comparing historical pricing data. We demonstrate the benefits of this metric and the case-based approach in a real-world application in comparison to a variety of conventional benchmarks., Comment: 15 pages. Accepted for presentation at the International Conference on Case-Based Reasoning 2021 (ICCBR)
Published: 2021
Full Text: View/download PDF

13. Exploring the Efficacy of Automatically Generated Counterfactuals for Sentiment Analysis

Author: Yang, Linyi, Li, Jiazheng, Cunningham, Pádraig, Zhang, Yue, Smyth, Barry, and Dong, Ruihai
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Logic in Computer Science
Abstract: While state-of-the-art NLP models have been achieving the excellent performance of a wide range of tasks in recent years, important questions are being raised about their robustness and their underlying sensitivity to systematic biases that may exist in their training and test data. Such issues come to be manifest in performance problems when faced with out-of-distribution data in the field. One recent solution has been to use counterfactually augmented datasets in order to reduce any reliance on spurious patterns that may exist in the original data. Producing high-quality augmented data can be costly and time-consuming as it usually needs to involve human feedback and crowdsourcing efforts. In this work, we propose an alternative by describing and evaluating an approach to automatically generating counterfactual data for data augmentation and explanation. A comprehensive evaluation on several different datasets and using a variety of state-of-the-art benchmarks demonstrate how our approach can achieve significant improvements in model performance when compared to models training on the original data and even when compared to models trained with the benefit of human-generated augmented data., Comment: Accepted to ACL-21
Published: 2021

14. Fact Check: Analyzing Financial Events from Multilingual News Sources

Author: Yang, Linyi, Ng, Tin Lok James, Smyth, Barry, and Dong, Ruihai
Subjects: Computer Science - Artificial Intelligence
Abstract: The explosion in the sheer magnitude and complexity of financial news data in recent years makes it increasingly challenging for investment analysts to extract valuable insights and perform analysis. We propose FactCheck in finance, a web-based news aggregator with deep learning models, to provide analysts with a holistic view of important financial events from multilingual news sources and extract events using an unsupervised clustering method. A web interface is provided to examine the credibility of news articles using a transformer-based fact-checker. The performance of the fact checker is evaluated using a dataset related to merger and acquisition (M\&A) events and is shown to outperform several strong baselines., Comment: Demo
Published: 2021

15. Twin Systems for DeepCBR: A Menagerie of Deep Learning and Case-Based Reasoning Pairings for Explanation and Data Augmentation

Author: Keane, Mark T, Kenny, Eoin M, Temraz, Mohammed, Greene, Derek, and Smyth, Barry
Subjects: Computer Science - Artificial Intelligence
Abstract: Recently, it has been proposed that fruitful synergies may exist between Deep Learning (DL) and Case Based Reasoning (CBR); that there are insights to be gained by applying CBR ideas to problems in DL (what could be called DeepCBR). In this paper, we report on a program of research that applies CBR solutions to the problem of Explainable AI (XAI) in the DL. We describe a series of twin-systems pairings of opaque DL models with transparent CBR models that allow the latter to explain the former using factual, counterfactual and semi-factual explanation strategies. This twinning shows that functional abstractions of DL (e.g., feature weights, feature importance and decision boundaries) can be used to drive these explanatory solutions. We also raise the prospect that this research also applies to the problem of Data Augmentation in DL, underscoring the fecundity of these DeepCBR ideas., Comment: 7 pages,4 figures, 2 tables
Published: 2021

16. Handling Climate Change Using Counterfactuals: Using Counterfactuals in Data Augmentation to Predict Crop Growth in an Uncertain Climate Future

Author: Temraz, Mohammed, Kenny, Eoin, Ruelle, Elodie, Shalloo, Laurence, Smyth, Barry, and Keane, Mark T
Subjects: Computer Science - Artificial Intelligence
Abstract: Climate change poses a major challenge to humanity, especially in its impact on agriculture, a challenge that a responsible AI should meet. In this paper, we examine a CBR system (PBI-CBR) designed to aid sustainable dairy farming by supporting grassland management, through accurate crop growth prediction. As climate changes, PBI-CBRs historical cases become less useful in predicting future grass growth. Hence, we extend PBI-CBR using data augmentation, to specifically handle disruptive climate events, using a counterfactual method (from XAI). Study 1 shows that historical, extreme climate-events (climate outlier cases) tend to be used by PBI-CBR to predict grass growth during climate disrupted periods. Study 2 shows that synthetic outliers, generated as counterfactuals on a outlier-boundary, improve the predictive accuracy of PBICBR, during the drought of 2018. This study also shows that an instance-based counterfactual method does better than a benchmark, constraint-guided method., Comment: 15 pages, 6 figures, 3 tables
Published: 2021

17. If Only We Had Better Counterfactual Explanations: Five Key Deficits to Rectify in the Evaluation of Counterfactual XAI Techniques

Author: Keane, Mark T, Kenny, Eoin M, Delaney, Eoin, and Smyth, Barry
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: In recent years, there has been an explosion of AI research on counterfactual explanations as a solution to the problem of eXplainable AI (XAI). These explanations seem to offer technical, psychological and legal benefits over other explanation techniques. We survey 100 distinct counterfactual explanation methods reported in the literature. This survey addresses the extent to which these methods have been adequately evaluated, both psychologically and computationally, and quantifies the shortfalls occurring. For instance, only 21% of these methods have been user tested. Five key deficits in the evaluation of these methods are detailed and a roadmap, with standardised benchmark evaluations, is proposed to resolve the issues arising; issues, that currently effectively block scientific progress in this field., Comment: 13 pages, 2 figures
Published: 2021

18. A Few Good Counterfactuals: Generating Interpretable, Plausible and Diverse Counterfactual Explanations

Author: Smyth, Barry and Keane, Mark T
Subjects: Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Counterfactual explanations provide a potentially significant solution to the Explainable AI (XAI) problem, but good, native counterfactuals have been shown to rarely occur in most datasets. Hence, the most popular methods generate synthetic counterfactuals using blind perturbation. However, such methods have several shortcomings: the resulting counterfactuals (i) may not be valid data-points (they often use features that do not naturally occur), (ii) may lack the sparsity of good counterfactuals (if they modify too many features), and (iii) may lack diversity (if the generated counterfactuals are minimal variants of one another). We describe a method designed to overcome these problems, one that adapts native counterfactuals in the original dataset, to generate sparse, diverse synthetic counterfactuals from naturally occurring features. A series of experiments are reported that systematically explore parametric variations of this novel method on common datasets to establish the conditions for optimal performance., Comment: 8 pages, 2 figures
Published: 2021

19. Generating Plausible Counterfactual Explanations for Deep Transformers in Financial Text Classification

Author: Yang, Linyi, Kenny, Eoin M., Ng, Tin Lok James, Yang, Yi, Smyth, Barry, and Dong, Ruihai
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Corporate mergers and acquisitions (M&A) account for billions of dollars of investment globally every year, and offer an interesting and challenging domain for artificial intelligence. However, in these highly sensitive domains, it is crucial to not only have a highly robust and accurate model, but be able to generate useful explanations to garner a user's trust in the automated system. Regrettably, the recent research regarding eXplainable AI (XAI) in financial text classification has received little to no attention, and many current methods for generating textual-based explanations result in highly implausible explanations, which damage a user's trust in the system. To address these issues, this paper proposes a novel methodology for producing plausible counterfactual explanations, whilst exploring the regularization benefits of adversarial training on language models in the domain of FinTech. Exhaustive quantitative experiments demonstrate that not only does this approach improve the model accuracy when compared to the current state-of-the-art and human performance, but it also generates counterfactual explanations which are significantly more plausible based on human trials., Comment: Accepted by COLING-20 (Oral)
Published: 2020

20. Good Counterfactuals and Where to Find Them: A Case-Based Technique for Generating Counterfactuals for Explainable AI (XAI)

Author: Keane, Mark T. and Smyth, Barry
Subjects: Computer Science - Artificial Intelligence, I.2.7, I.2.6
Abstract: Recently, a groundswell of research has identified the use of counterfactual explanations as a potentially significant solution to the Explainable AI (XAI) problem. It is argued that (a) technically, these counterfactual cases can be generated by permuting problem-features until a class change is found, (b) psychologically, they are much more causally informative than factual explanations, (c) legally, they are GDPR-compliant. However, there are issues around the finding of good counterfactuals using current techniques (e.g. sparsity and plausibility). We show that many commonly-used datasets appear to have few good counterfactuals for explanation purposes. So, we propose a new case based approach for generating counterfactuals using novel ideas about the counterfactual potential and explanatory coverage of a case-base. The new technique reuses patterns of good counterfactuals, present in a case-base, to generate analogous counterfactuals that can explain new problems and their solutions. Several experiments show how this technique can improve the counterfactual potential and explanatory coverage of case-bases that were previously found wanting., Comment: 15 pages, 3 figures
Published: 2020

21. Personalized, Health-Aware Recipe Recommendation: An Ensemble Topic Modeling Based Approach

Author: Khan, Mansura A., Rushe, Ellen, Smyth, Barry, and Coyle, David
Subjects: Computer Science - Information Retrieval, Computer Science - Machine Learning, I.2
Abstract: Food choices are personal and complex and have a significant impact on our long-term health and quality of life. By helping users to make informed and satisfying decisions, Recommender Systems (RS) have the potential to support users in making healthier food choices. Intelligent users-modeling is a key challenge in achieving this potential. This paper investigates Ensemble Topic Modelling (EnsTM) based Feature Identification techniques for efficient user-modeling and recipe recommendation. It builds on findings in EnsTM to propose a reduced data representation format and a smart user-modeling strategy that makes capturing user-preference fast, efficient and interactive. This approach enables personalization, even in a cold-start scenario. This paper proposes two different EnsTM based and one Hybrid EnsTM based recommenders. We compared all three EnsTM based variations through a user study with 48 participants, using a large-scale,real-world corpus of 230,876 recipes, and compare against a conventional Content Based (CB) approach. EnsTM based recommenders performed significantly better than the CB approach. Besides acknowledging multi-domain contents such as taste, demographics and cost, our proposed approach also considers user's nutritional preference and assists them finding recipes under diverse nutritional categories. Furthermore, it provides excellent coverage and enables implicit understanding of user's food practices. Subsequent analysis also exposed correlation between certain features and a healthier lifestyle., Comment: This is a pre-print version of the accepted full-paper in HealthRecsys2019 workshop (https://healthrecsys.github.io/2019/). The final version of the article would be published in the workshop preceding
Published: 2019

22. RARD II: The 94 Million Related-Article Recommendation Dataset

Author: Beel, Joeran, Smyth, Barry, and Collins, Andrew
Subjects: Computer Science - Information Retrieval, Computer Science - Artificial Intelligence, Computer Science - Digital Libraries, Computer Science - Machine Learning
Abstract: The main contribution of this paper is to introduce and describe a new recommender-systems dataset (RARD II). It is based on data from Mr. DLib, a recommender-system as-a-service in the digital library and reference-management-software domain. As such, RARD II complements datasets from other domains such as books, movies, and music. The dataset encompasses 94m recommendations, delivered in the two years from September 2016 to September 2018. The dataset covers an item-space of 24m unique items. RARD II provides a range of rich recommendation data, beyond conventional ratings. For example, in addition to the usual (implicit) ratings matrices, RARD II includes the original recommendation logs, which provide a unique insight into many aspects of the algorithms that generated the recommendations. The logs enable researchers to conduct various analyses about a real-world recommender system. This includes the evaluation of meta-learning approaches for predicting algorithm performance. In this paper, we summarise the key features of this dataset release, describe how it was generated and discuss some of its unique features. Compared to its predecessor RARD, RARD II contains 64% more recommendations, 187% more features (algorithms, parameters, and statistics), 50% more clicks, 140% more documents, and one additional service partner (JabRef).
Published: 2018

23. Aggregating Content and Network Information to Curate Twitter User Lists

Author: Greene, Derek, Sheridan, Gavin, Smyth, Barry, and Cunningham, Pádraig
Subjects: Computer Science - Social and Information Networks, Computer Science - Artificial Intelligence, Physics - Physics and Society
Abstract: Twitter introduced user lists in late 2009, allowing users to be grouped according to meaningful topics or themes. Lists have since been adopted by media outlets as a means of organising content around news stories. Thus the curation of these lists is important - they should contain the key information gatekeepers and present a balanced perspective on a story. Here we address this list curation process from a recommender systems perspective. We propose a variety of criteria for generating user list recommendations, based on content analysis, network analysis, and the "crowdsourcing" of existing user lists. We demonstrate that these types of criteria are often only successful for datasets with certain characteristics. To resolve this issue, we propose the aggregation of these different "views" of a news story on Twitter to produce more accurate user recommendations to support the curation process.
Published: 2012

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

23 results on '"Smyth, Barry"'

1. An Analysis of the Impact of Gold Open Access Publications in Computer Science

2. Contrastive Learning of Asset Embeddings from Financial Time Series

3. Facilitating Interdisciplinary Knowledge Transfer with Research Paper Recommender Systems

4. Industry Classification Using a Novel Financial Time-Series Case Representation

5. Item Graph Convolution Collaborative Filtering for Inductive Recommendations

6. A Multimodal Embedding-Based Approach to Industry Classification in Financial Markets

7. Author Multidisciplinarity and Disciplinary Roles in Field of Study Networks

8. Stock Embeddings: Learning Distributed Representations for Financial Assets

9. NumHTML: Numeric-Oriented Hierarchical Transformer Model for Multi-task Financial Forecasting

10. Investigating Health-Aware Smart-Nudging with Machine Learning to Help People Pursue Healthier Eating-Habits

11. Collaboration in the Time of COVID: A Scientometric Analysis of Multidisciplinary SARS-CoV-2 Research

12. Measuring Financial Time Series Similarity With a View to Identifying Profitable Stock Market Opportunities

13. Exploring the Efficacy of Automatically Generated Counterfactuals for Sentiment Analysis

14. Fact Check: Analyzing Financial Events from Multilingual News Sources

15. Twin Systems for DeepCBR: A Menagerie of Deep Learning and Case-Based Reasoning Pairings for Explanation and Data Augmentation

16. Handling Climate Change Using Counterfactuals: Using Counterfactuals in Data Augmentation to Predict Crop Growth in an Uncertain Climate Future

17. If Only We Had Better Counterfactual Explanations: Five Key Deficits to Rectify in the Evaluation of Counterfactual XAI Techniques

18. A Few Good Counterfactuals: Generating Interpretable, Plausible and Diverse Counterfactual Explanations

19. Generating Plausible Counterfactual Explanations for Deep Transformers in Financial Text Classification

20. Good Counterfactuals and Where to Find Them: A Case-Based Technique for Generating Counterfactuals for Explainable AI (XAI)

21. Personalized, Health-Aware Recipe Recommendation: An Ensemble Topic Modeling Based Approach

22. RARD II: The 94 Million Related-Article Recommendation Dataset

23. Aggregating Content and Network Information to Curate Twitter User Lists

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Publication Type

Database

23 results on '"Smyth, Barry"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources