Author: "Bielikova, Maria" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Bielikova, Maria"' showing total 753 results

Start Over Author "Bielikova, Maria"

753 results on '"Bielikova, Maria"'

1. Use Random Selection for Now: Investigation of Few-Shot Selection Strategies in LLM-based Text Augmentation for Classification

Author: Cegin, Jan, Pecher, Branislav, Simko, Jakub, Srba, Ivan, Bielikova, Maria, and Brusilovsky, Peter
Subjects: Computer Science - Computation and Language
Abstract: The generative large language models (LLMs) are increasingly used for data augmentation tasks, where text samples are paraphrased (or generated anew) and then used for classifier fine-tuning. Existing works on augmentation leverage the few-shot scenarios, where samples are given to LLMs as part of prompts, leading to better augmentations. Yet, the samples are mostly selected randomly and a comprehensive overview of the effects of other (more ``informed'') sample selection strategies is lacking. In this work, we compare sample selection strategies existing in few-shot learning literature and investigate their effects in LLM-based textual augmentation. We evaluate this on in-distribution and out-of-distribution classifier performance. Results indicate, that while some ``informed'' selection strategies increase the performance of models, especially for out-of-distribution data, it happens only seldom and with marginal performance increases. Unless further advances are made, a default of random sample selection remains a good option for augmentation practitioners.
Published: 2024

2. Task Prompt Vectors: Effective Initialization through Multi-Task Soft-Prompt Transfer

Author: Belanec, Robert, Ostermann, Simon, Srba, Ivan, and Bielikova, Maria
Subjects: Computer Science - Computation and Language
Abstract: Prompt tuning is an efficient solution for training large language models (LLMs). However, current soft-prompt-based methods often sacrifice multi-task modularity, requiring the training process to be fully or partially repeated for each newly added task. While recent work on task vectors applied arithmetic operations on full model weights to achieve the desired multi-task performance, a similar approach for soft-prompts is still missing. To this end, we introduce Task Prompt Vectors, created by element-wise difference between weights of tuned soft-prompts and their random initialization. Experimental results on 12 NLU datasets show that task prompt vectors can be used in low-resource settings to effectively initialize prompt tuning on similar tasks. In addition, we show that task prompt vectors are independent of the random initialization of prompt tuning on 2 different language model architectures. This allows prompt arithmetics with the pre-trained vectors from different tasks. In this way, we provide a competitive alternative to state-of-the-art baselines by arithmetic addition of task prompt vectors from multiple tasks.
Published: 2024

3. Fighting Randomness with Randomness: Mitigating Optimisation Instability of Fine-Tuning using Delayed Ensemble and Noisy Interpolation

Author: Pecher, Branislav, Cegin, Jan, Belanec, Robert, Simko, Jakub, Srba, Ivan, and Bielikova, Maria
Subjects: Computer Science - Computation and Language
Abstract: While fine-tuning of pre-trained language models generally helps to overcome the lack of labelled training samples, it also displays model performance instability. This instability mainly originates from randomness in initialisation or data shuffling. To address this, researchers either modify the training process or augment the available samples, which typically results in increased computational costs. We propose a new mitigation strategy, called Delayed Ensemble with Noisy Interpolation (DENI), that leverages the strengths of ensembling, noise regularisation and model interpolation, while retaining computational efficiency. We compare DENI with 9 representative mitigation strategies across 3 models, 4 tuning strategies and 7 text classification datasets. We show that: 1) DENI outperforms the best performing mitigation strategy (Ensemble), while using only a fraction of its cost; 2) the mitigation strategies are beneficial for parameter-efficient fine-tuning (PEFT) methods, outperforming full fine-tuning in specific cases; and 3) combining DENI with data augmentation often leads to even more effective instability mitigation., Comment: Accepted to the Findings of the EMNLP'24 Conference
Published: 2024

4. On Sensitivity of Learning with Limited Labelled Data to the Effects of Randomness: Impact of Interactions and Systematic Choices

Author: Pecher, Branislav, Srba, Ivan, and Bielikova, Maria
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: While learning with limited labelled data can improve performance when the labels are lacking, it is also sensitive to the effects of uncontrolled randomness introduced by so-called randomness factors (e.g., varying order of data). We propose a method to systematically investigate the effects of randomness factors while taking the interactions between them into consideration. To measure the true effects of an individual randomness factor, our method mitigates the effects of other factors and observes how the performance varies across multiple runs. Applying our method to multiple randomness factors across in-context learning and fine-tuning approaches on 7 representative text classification tasks and meta-learning on 3 tasks, we show that: 1) disregarding interactions between randomness factors in existing works caused inconsistent findings due to incorrect attribution of the effects of randomness factors, such as disproving the consistent sensitivity of in-context learning to sample order even with random sample selection; and 2) besides mutual interactions, the effects of randomness factors, especially sample order, are also dependent on more systematic choices unexplored in existing works, such as number of classes, samples per class or choice of prompt format., Comment: Accepted to the EMNLP'24 Main Conference
Published: 2024

5. Comparing Specialised Small and General Large Language Models on Text Classification: 100 Labelled Samples to Achieve Break-Even Performance

Author: Pecher, Branislav, Srba, Ivan, and Bielikova, Maria
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: When solving NLP tasks with limited labelled data, researchers can either use a general large language model without further update, or use a small number of labelled examples to tune a specialised smaller model. In this work, we address the research gap of how many labelled samples are required for the specialised small models to outperform general large models, while taking the performance variance into consideration. By observing the behaviour of fine-tuning, instruction-tuning, prompting and in-context learning on 7 language models, we identify such performance break-even points across 8 representative text classification tasks of varying characteristics. We show that the specialised models often need only few samples (on average $10 - 1000$) to be on par or better than the general ones. At the same time, the number of required labels strongly depends on the dataset or task characteristics, with this number being significantly lower on multi-class datasets (up to $100$) than on binary datasets (up to $5000$). When performance variance is taken into consideration, the number of required labels increases on average by $100 - 200\%$ and even up to $1500\%$ in specific cases.
Published: 2024

6. Automatic Combination of Sample Selection Strategies for Few-Shot Learning

Author: Pecher, Branislav, Srba, Ivan, Bielikova, Maria, and Vanschoren, Joaquin
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: In few-shot learning, such as meta-learning, few-shot fine-tuning or in-context learning, the limited number of samples used to train a model have a significant impact on the overall success. Although a large number of sample selection strategies exist, their impact on the performance of few-shot learning is not extensively known, as most of them have been so far evaluated in typical supervised settings only. In this paper, we thoroughly investigate the impact of 20 sample selection strategies on the performance of 5 few-shot learning approaches over 8 image and 6 text datasets. In addition, we propose a new method for automatic combination of sample selection strategies (ACSESS) that leverages the strengths and complementary information of the individual strategies. The experimental results show that our method consistently outperforms the individual selection strategies, as well as the recently proposed method for selecting support examples for in-context learning. We also show a strong modality, dataset and approach dependence for the majority of strategies as well as their dependence on the number of shots - demonstrating that the sample selection strategies play a significant role for lower number of shots, but regresses to random selection at higher number of shots.
Published: 2024

7. Authorship Obfuscation in Multilingual Machine-Generated Text Detection

Author: Macko, Dominik, Moro, Robert, Uchendu, Adaku, Srba, Ivan, Lucas, Jason Samuel, Yamashita, Michiharu, Tripto, Nafis Irtiza, Lee, Dongwon, Simko, Jakub, and Bielikova, Maria
Subjects: Computer Science - Computation and Language
Abstract: High-quality text generation capability of recent Large Language Models (LLMs) causes concerns about their misuse (e.g., in massive generation/spread of disinformation). Machine-generated text (MGT) detection is important to cope with such threats. However, it is susceptible to authorship obfuscation (AO) methods, such as paraphrasing, which can cause MGTs to evade detection. So far, this was evaluated only in monolingual settings. Thus, the susceptibility of recently proposed multilingual detectors is still unknown. We fill this gap by comprehensively benchmarking the performance of 10 well-known AO methods, attacking 37 MGT detection methods against MGTs in 11 languages (i.e., 10 $\times$ 37 $\times$ 11 = 4,070 combinations). We also evaluate the effect of data augmentation on adversarial robustness using obfuscated texts. The results indicate that all tested AO methods can cause evasion of automated detection in all tested languages, where homoglyph attacks are especially successful. However, some of the AO methods severely damaged the text, making it no longer readable or easily recognizable by humans (e.g., changed language, weird characters)., Comment: Accepted to EMNLP 2024 Findings
Published: 2024

8. Effects of diversity incentives on sample diversity and downstream model performance in LLM-based text augmentation

Author: Cegin, Jan, Pecher, Branislav, Simko, Jakub, Srba, Ivan, Bielikova, Maria, and Brusilovsky, Peter
Subjects: Computer Science - Computation and Language
Abstract: The latest generative large language models (LLMs) have found their application in data augmentation tasks, where small numbers of text samples are LLM-paraphrased and then used to fine-tune downstream models. However, more research is needed to assess how different prompts, seed data selection strategies, filtering methods, or model settings affect the quality of paraphrased data (and downstream models). In this study, we investigate three text diversity incentive methods well established in crowdsourcing: taboo words, hints by previous outlier solutions, and chaining on previous outlier solutions. Using these incentive methods as part of instructions to LLMs augmenting text datasets, we measure their effects on generated texts lexical diversity and downstream model performance. We compare the effects over 5 different LLMs, 6 datasets and 2 downstream models. We show that diversity is most increased by taboo words, but downstream model performance is highest with hints., Comment: ACL'24 version, 24 pages
Published: 2024

9. A Survey on Stability of Learning with Limited Labelled Data and its Sensitivity to the Effects of Randomness

Author: Pecher, Branislav, Srba, Ivan, and Bielikova, Maria
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Learning with limited labelled data, such as prompting, in-context learning, fine-tuning, meta-learning or few-shot learning, aims to effectively train a model using only a small amount of labelled samples. However, these approaches have been observed to be excessively sensitive to the effects of uncontrolled randomness caused by non-determinism in the training process. The randomness negatively affects the stability of the models, leading to large variances in results across training runs. When such sensitivity is disregarded, it can unintentionally, but unfortunately also intentionally, create an imaginary perception of research progress. Recently, this area started to attract research attention and the number of relevant studies is continuously growing. In this survey, we provide a comprehensive overview of 415 papers addressing the effects of randomness on the stability of learning with limited labelled data. We distinguish between four main tasks addressed in the papers (investigate/evaluate; determine; mitigate; benchmark/compare/report randomness effects), providing findings for each one. Furthermore, we identify and discuss seven challenges and open problems together with possible directions to facilitate further research. The ultimate goal of this survey is to emphasise the importance of this growing research area, which so far has not received an appropriate level of attention, and reveal impactful directions for future research., Comment: Accepted to ACM Comput. Surv. 2024
Published: 2023
Full Text: View/download PDF

10. Disinformation Capabilities of Large Language Models

Author: Vykopal, Ivan, Pikuliak, Matúš, Srba, Ivan, Moro, Robert, Macko, Dominik, and Bielikova, Maria
Subjects: Computer Science - Computation and Language
Abstract: Automated disinformation generation is often listed as an important risk associated with large language models (LLMs). The theoretical ability to flood the information space with disinformation content might have dramatic consequences for societies around the world. This paper presents a comprehensive study of the disinformation capabilities of the current generation of LLMs to generate false news articles in the English language. In our study, we evaluated the capabilities of 10 LLMs using 20 disinformation narratives. We evaluated several aspects of the LLMs: how good they are at generating news articles, how strongly they tend to agree or disagree with the disinformation narratives, how often they generate safety warnings, etc. We also evaluated the abilities of detection models to detect these articles as LLM-generated. We conclude that LLMs are able to generate convincing news articles that agree with dangerous disinformation narratives.
Published: 2023

11. MULTITuDE: Large-Scale Multilingual Machine-Generated Text Detection Benchmark

Author: Macko, Dominik, Moro, Robert, Uchendu, Adaku, Lucas, Jason Samuel, Yamashita, Michiharu, Pikuliak, Matúš, Srba, Ivan, Le, Thai, Lee, Dongwon, Simko, Jakub, and Bielikova, Maria
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: There is a lack of research into capabilities of recent LLMs to generate convincing text in languages other than English and into performance of detectors of machine-generated text in multilingual settings. This is also reflected in the available benchmarks which lack authentic texts in languages other than English and predominantly cover older generators. To fill this gap, we introduce MULTITuDE, a novel benchmarking dataset for multilingual machine-generated text detection comprising of 74,081 authentic and machine-generated texts in 11 languages (ar, ca, cs, de, en, es, nl, pt, ru, uk, and zh) generated by 8 multilingual LLMs. Using this benchmark, we compare the performance of zero-shot (statistical and black-box) and fine-tuned detectors. Considering the multilinguality, we evaluate 1) how these detectors generalize to unseen languages (linguistically similar as well as dissimilar) and unseen LLMs and 2) whether the detectors improve their performance when trained on multiple languages.
Published: 2023
Full Text: View/download PDF

12. FUTURE-AI: International consensus guideline for trustworthy and deployable artificial intelligence in healthcare

Author: Lekadir, Karim, Feragen, Aasa, Fofanah, Abdul Joseph, Frangi, Alejandro F, Buyx, Alena, Emelie, Anais, Lara, Andrea, Porras, Antonio R, Chan, An-Wen, Navarro, Arcadi, Glocker, Ben, Botwe, Benard O, Khanal, Bishesh, Beger, Brigit, Wu, Carol C, Cintas, Celia, Langlotz, Curtis P, Rueckert, Daniel, Mzurikwao, Deogratias, Fotiadis, Dimitrios I, Zhussupov, Doszhan, Ferrante, Enzo, Meijering, Erik, Weicken, Eva, González, Fabio A, Asselbergs, Folkert W, Prior, Fred, Krestin, Gabriel P, Collins, Gary, Tegenaw, Geletaw S, Kaissis, Georgios, Misuraca, Gianluca, Tsakou, Gianna, Dwivedi, Girish, Kondylakis, Haridimos, Jayakody, Harsha, Woodruf, Henry C, Mayer, Horst Joachim, Aerts, Hugo JWL, Walsh, Ian, Chouvarda, Ioanna, Buvat, Irène, Tributsch, Isabell, Rekik, Islem, Duncan, James, Kalpathy-Cramer, Jayashree, Zahir, Jihad, Park, Jinah, Mongan, John, Gichoya, Judy W, Schnabel, Julia A, Kushibar, Kaisar, Riklund, Katrine, Mori, Kensaku, Marias, Kostas, Amugongo, Lameck M, Fromont, Lauren A, Maier-Hein, Lena, Alberich, Leonor Cerdá, Rittner, Leticia, Phiri, Lighton, Marrakchi-Kacem, Linda, Donoso-Bach, Lluís, Martí-Bonmatí, Luis, Cardoso, M Jorge, Bobowicz, Maciej, Shabani, Mahsa, Tsiknakis, Manolis, Zuluaga, Maria A, Bielikova, Maria, Fritzsche, Marie-Christine, Camacho, Marina, Linguraru, Marius George, Wenzel, Markus, De Bruijne, Marleen, Tolsgaard, Martin G, Ghassemi, Marzyeh, Ashrafuzzaman, Md, Goisauf, Melanie, Yaqub, Mohammad, Abadía, Mónica Cano, Mahmoud, Mukhtar M E, Elattar, Mustafa, Rieke, Nicola, Papanikolaou, Nikolaos, Lazrak, Noussair, Díaz, Oliver, Salvado, Olivier, Pujol, Oriol, Sall, Ousmane, Guevara, Pamela, Gordebeke, Peter, Lambin, Philippe, Brown, Pieta, Abolmaesumi, Purang, Dou, Qi, Lu, Qinghua, Osuala, Richard, Nakasi, Rose, Zhou, S Kevin, Napel, Sandy, Colantonio, Sara, Albarqouni, Shadi, Joshi, Smriti, Carter, Stacy, Klein, Stefan, Petersen, Steffen E, Aussó, Susanna, Awate, Suyash, Raviv, Tammy Riklin, Cook, Tessa, Mutsvangwa, Tinashe E M, Rogers, Wendy A, Niessen, Wiro J, Puig-Bosch, Xènia, Zeng, Yi, Mohammed, Yunusa G, Aquino, Yves Saint James, Salahuddin, Zohaib, and Starmans, Martijn P A
Subjects: Computer Science - Computers and Society, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, I.2.0, I.4.0, I.5.0
Abstract: Despite major advances in artificial intelligence (AI) for medicine and healthcare, the deployment and adoption of AI technologies remain limited in real-world clinical practice. In recent years, concerns have been raised about the technical, clinical, ethical and legal risks associated with medical AI. To increase real world adoption, it is essential that medical AI tools are trusted and accepted by patients, clinicians, health organisations and authorities. This work describes the FUTURE-AI guideline as the first international consensus framework for guiding the development and deployment of trustworthy AI tools in healthcare. The FUTURE-AI consortium was founded in 2021 and currently comprises 118 inter-disciplinary experts from 51 countries representing all continents, including AI scientists, clinicians, ethicists, and social scientists. Over a two-year period, the consortium defined guiding principles and best practices for trustworthy AI through an iterative process comprising an in-depth literature review, a modified Delphi survey, and online consensus meetings. The FUTURE-AI framework was established based on 6 guiding principles for trustworthy AI in healthcare, i.e. Fairness, Universality, Traceability, Usability, Robustness and Explainability. Through consensus, a set of 28 best practices were defined, addressing technical, clinical, legal and socio-ethical dimensions. The recommendations cover the entire lifecycle of medical AI, from design, development and validation to regulation, deployment, and monitoring. FUTURE-AI is a risk-informed, assumption-free guideline which provides a structured approach for constructing medical AI tools that will be trusted, deployed and adopted in real-world practice. Researchers are encouraged to take the recommendations into account in proof-of-concept stages to facilitate future translation towards clinical practice of medical AI.
Published: 2023

13. Multilingual Previously Fact-Checked Claim Retrieval

Author: Pikuliak, Matúš, Srba, Ivan, Moro, Robert, Hromadka, Timo, Smolen, Timotej, Melisek, Martin, Vykopal, Ivan, Simko, Jakub, Podrouzek, Juraj, and Bielikova, Maria
Subjects: Computer Science - Computation and Language
Abstract: Fact-checkers are often hampered by the sheer amount of online content that needs to be fact-checked. NLP can help them by retrieving already existing fact-checks relevant to the content being investigated. This paper introduces a new multilingual dataset -- MultiClaim -- for previously fact-checked claim retrieval. We collected 28k posts in 27 languages from social media, 206k fact-checks in 39 languages written by professional fact-checkers, as well as 31k connections between these two groups. This is the most extensive and the most linguistically diverse dataset of this kind to date. We evaluated how different unsupervised methods fare on this dataset and its various dimensions. We show that evaluating such a diverse dataset has its complexities and proper care needs to be taken before interpreting the results. We also evaluated a supervised fine-tuning approach, improving upon the unsupervised method significantly., Comment: Accepted at EMNLP 2023
Published: 2023
Full Text: View/download PDF

14. Eye Tracking as a Source of Implicit Feedback in Recommender Systems: A Preliminary Analysis

Author: de Leon-Martinez, Santiago, Moro, Robert, and Bielikova, Maria
Subjects: Computer Science - Information Retrieval, Computer Science - Human-Computer Interaction
Abstract: Eye tracking in recommender systems can provide an additional source of implicit feedback, while helping to evaluate other sources of feedback. In this study, we use eye tracking data to inform a collaborative filtering model for movie recommendation providing an improvement over the click-based implementations and additionally analyze the area of interest (AOI) duration as related to the known information of click data and movies seen previously, showing AOI information consistently coincides with these items of interest., Comment: Paper accepted to Eyes4ICU workshop at ETRA 2023
Published: 2023
Full Text: View/download PDF

15. Searching for Discriminative Words in Multidimensional Continuous Feature Space

Author: Sajgalik, Marius, Barla, Michal, and Bielikova, Maria
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Information Retrieval, Computer Science - Neural and Evolutionary Computing, I.2.7
Abstract: Word feature vectors have been proven to improve many NLP tasks. With recent advances in unsupervised learning of these feature vectors, it became possible to train it with much more data, which also resulted in better quality of learned features. Since it learns joint probability of latent features of words, it has the advantage that we can train it without any prior knowledge about the goal task we want to solve. We aim to evaluate the universal applicability property of feature vectors, which has been already proven to hold for many standard NLP tasks like part-of-speech tagging or syntactic parsing. In our case, we want to understand the topical focus of text documents and design an efficient representation suitable for discriminating different topics. The discriminativeness can be evaluated adequately on text categorisation task. We propose a novel method to extract discriminative keywords from documents. We utilise word feature vectors to understand the relations between words better and also understand the latent topics which are discussed in the text and not mentioned directly but inferred logically. We also present a simple way to calculate document feature vectors out of extracted discriminative words. We evaluate our method on the four most popular datasets for text categorisation. We show how different discriminative metrics influence the overall results. We demonstrate the effectiveness of our approach by achieving state-of-the-art results on text categorisation task using just a small number of extracted keywords. We prove that word feature vectors can substantially improve the topical inference of documents' meaning. We conclude that distributed representation of words can be used to build higher levels of abstraction as we demonstrate and build feature vectors of documents.
Published: 2022
Full Text: View/download PDF

16. Autonomation, not Automation: Activities and Needs of Fact-checkers as a Basis for Designing Human-Centered AI Systems

Author: Hrckova, Andrea, Moro, Robert, Srba, Ivan, Simko, Jakub, and Bielikova, Maria
Subjects: Computer Science - Computers and Society, Computer Science - Artificial Intelligence, Computer Science - Human-Computer Interaction
Abstract: To mitigate the negative effects of false information more effectively, the development of Artificial Intelligence (AI) systems assisting fact-checkers is needed. Nevertheless, the lack of focus on the needs of these stakeholders results in their limited acceptance and skepticism toward automating the whole fact-checking process. In this study, we conducted semi-structured in-depth interviews with Central European fact-checkers. Their activities and problems were analyzed using iterative content analysis. The most significant problems were validated with a survey of European fact-checkers, in which we collected 24 responses from 20 countries, i.e., 62\% of active European signatories of the International Fact-Checking Network (IFCN). Our contributions include an in-depth examination of the variability of fact-checking work in non-English speaking regions, which still remained largely uncovered. By aligning them with the knowledge from prior studies, we created conceptual models that help understand the fact-checking processes. Thanks to the interdisciplinary collaboration, we extend the fact-checking process in AI research by three additional stages. In addition, we mapped our findings on the fact-checkers' activities and needs to the relevant tasks for AI research. The new opportunities identified for AI researchers and developers have implications for the focus of AI research in this domain., Comment: 37 pages, 14 figures, 2 annexes
Published: 2022

17. Auditing YouTube's Recommendation Algorithm for Misinformation Filter Bubbles

Author: Srba, Ivan, Moro, Robert, Tomlein, Matus, Pecher, Branislav, Simko, Jakub, Stefancova, Elena, Kompan, Michal, Hrckova, Andrea, Podrouzek, Juraj, Gavornik, Adrian, and Bielikova, Maria
Subjects: Computer Science - Information Retrieval, Computer Science - Machine Learning, Computer Science - Social and Information Networks
Abstract: In this paper, we present results of an auditing study performed over YouTube aimed at investigating how fast a user can get into a misinformation filter bubble, but also what it takes to "burst the bubble", i.e., revert the bubble enclosure. We employ a sock puppet audit methodology, in which pre-programmed agents (acting as YouTube users) delve into misinformation filter bubbles by watching misinformation promoting content. Then they try to burst the bubbles and reach more balanced recommendations by watching misinformation debunking content. We record search results, home page results, and recommendations for the watched videos. Overall, we recorded 17,405 unique videos, out of which we manually annotated 2,914 for the presence of misinformation. The labeled data was used to train a machine learning model classifying videos into three classes (promoting, debunking, neutral) with the accuracy of 0.82. We use the trained model to classify the remaining videos that would not be feasible to annotate manually. Using both the manually and automatically annotated data, we observe the misinformation bubble dynamics for a range of audited topics. Our key finding is that even though filter bubbles do not appear in some situations, when they do, it is possible to burst them by watching misinformation debunking content (albeit it manifests differently from topic to topic). We also observe a sudden decrease of misinformation filter bubble effect when misinformation debunking videos are watched after misinformation promoting videos, suggesting a strong contextuality of recommendations. Finally, when comparing our results with a previous similar study, we do not observe significant improvements in the overall quantity of recommended misinformation content., Comment: Just accepted to ACM Transactions on Recommender Systems (ACM TORS). arXiv admin note: substantial text overlap with arXiv:2203.13769
Published: 2022
Full Text: View/download PDF

18. Monant Medical Misinformation Dataset: Mapping Articles to Fact-Checked Claims

Author: Srba, Ivan, Pecher, Branislav, Tomlein, Matus, Moro, Robert, Stefancova, Elena, Simko, Jakub, and Bielikova, Maria
Subjects: Computer Science - Computation and Language, Computer Science - Computers and Society, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: False information has a significant negative influence on individuals as well as on the whole society. Especially in the current COVID-19 era, we witness an unprecedented growth of medical misinformation. To help tackle this problem with machine learning approaches, we are publishing a feature-rich dataset of approx. 317k medical news articles/blogs and 3.5k fact-checked claims. It also contains 573 manually and more than 51k automatically labelled mappings between claims and articles. Mappings consist of claim presence, i.e., whether a claim is contained in a given article, and article stance towards the claim. We provide several baselines for these two tasks and evaluate them on the manually labelled part of the dataset. The dataset enables a number of additional tasks related to medical misinformation, such as misinformation characterisation studies or studies of misinformation diffusion between sources., Comment: 11 pages, 4 figures, SIGIR 2022 Resource paper track
Published: 2022
Full Text: View/download PDF

19. An Audit of Misinformation Filter Bubbles on YouTube: Bubble Bursting and Recent Behavior Changes

Author: Tomlein, Matus, Pecher, Branislav, Simko, Jakub, Srba, Ivan, Moro, Robert, Stefancova, Elena, Kompan, Michal, Hrckova, Andrea, Podrouzek, Juraj, and Bielikova, Maria
Subjects: Computer Science - Information Retrieval, Computer Science - Artificial Intelligence
Abstract: The negative effects of misinformation filter bubbles in adaptive systems have been known to researchers for some time. Several studies investigated, most prominently on YouTube, how fast a user can get into a misinformation filter bubble simply by selecting wrong choices from the items offered. Yet, no studies so far have investigated what it takes to burst the bubble, i.e., revert the bubble enclosure. We present a study in which pre-programmed agents (acting as YouTube users) delve into misinformation filter bubbles by watching misinformation promoting content (for various topics). Then, by watching misinformation debunking content, the agents try to burst the bubbles and reach more balanced recommendation mixes. We recorded the search results and recommendations, which the agents encountered, and analyzed them for the presence of misinformation. Our key finding is that bursting of a filter bubble is possible, albeit it manifests differently from topic to topic. Moreover, we observe that filter bubbles do not truly appear in some situations. We also draw a direct comparison with a previous study. Sadly, we did not find much improvements in misinformation occurrences, despite recent pledges by YouTube., Comment: RecSys '21: Fifteenth ACM Conference on Recommender System
Published: 2022
Full Text: View/download PDF

20. Exploring Customer Price Preference and Product Profit Role in Recommender Systems

Author: Kompan, Michal, Gaspar, Peter, Macina, Jakub, Cimerman, Matus, and Bielikova, Maria
Subjects: Computer Science - Information Retrieval, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Most of the research in the recommender systems domain is focused on the optimization of the metrics based on historical data such as Mean Average Precision (MAP) or Recall. However, there is a gap between the research and industry since the leading Key Performance Indicators (KPIs) for businesses are revenue and profit. In this paper, we explore the impact of manipulating the profit awareness of a recommender system. An average e-commerce business does not usually use a complicated recommender algorithm. We propose an adjustment of a predicted ranking for score-based recommender systems and explore the effect of the profit and customers' price preferences on two industry datasets from the fashion domain. In the experiments, we show the ability to improve both the precision and the generated recommendations' profit. Such an outcome represents a win-win situation when e-commerce increases the profit and customers get more valuable recommendations., Comment: in IEEE Intelligent Systems
Published: 2022
Full Text: View/download PDF

21. A Study of Fake News Reading and Annotating in Social Media Context

Author: Simko, Jakub, Racsko, Patrik, Tomlein, Matus, Hanakova, Martin, Moro, Robert, and Bielikova, Maria
Subjects: Computer Science - Human-Computer Interaction, Computer Science - Computers and Society, Computer Science - Machine Learning, Computer Science - Social and Information Networks, H.5.2, H.5.4, K.4.2, H.3.1
Abstract: The online spreading of fake news is a major issue threatening entire societies. Much of this spreading is enabled by new media formats, namely social networks and online media sites. Researchers and practitioners have been trying to answer this by characterizing the fake news and devising automated methods for detecting them. The detection methods had so far only limited success, mostly due to the complexity of the news content and context and lack of properly annotated datasets. One possible way to boost the efficiency of automated misinformation detection methods, is to imitate the detection work of humans. It is also important to understand the news consumption behavior of online users. In this paper, we present an eye-tracking study, in which we let 44 lay participants to casually read through a social media feed containing posts with news articles, some of which were fake. In a second run, we asked the participants to decide on the truthfulness of these articles. We also describe a follow-up qualitative study with a similar scenario but this time with 7 expert fake news annotators. We present the description of both studies, characteristics of the resulting dataset (which we hereby publish) and several findings.
Published: 2021
Full Text: View/download PDF

22. The Cold-start Problem: Minimal Users' Activity Estimation

Author: Visnovsky, Juraj, Kassak, Ondrej, Kompan, Michal, and Bielikova, Maria
Subjects: Computer Science - Information Retrieval
Abstract: Cold-start problem, which arises upon the new users arrival, is one of the fundamental problems in today's recommender approaches. Moreover, in some domains as TV or multime-dia-items take long time to experience by users, thus users usually do not provide rich preference information. In this paper we analyze the minimal amount of ratings needs to be done by a user over a set of items, in order to solve or reduce the cold-start problem. In our analysis we applied clustering data mining technique in order to identify minimal amount of item's ratings required from recommender system's users, in order to be assigned to a correct cluster. In this context, cluster quality is being monitored and in case of reaching certain cluster quality threshold, the rec-ommender system could start to generate recommendations for given user, as in this point cold-start problem is considered as resolved. Our proposed approach is applicable to any domain in which user preferences are received based on explicit items rating. Our experiments are performed within the movie and jokes recommendation domain using the MovieLens and Jester dataset., Comment: 1st Workshop on Recommender Systems for Television and online Video (RecSysTV) in conjunction with 8th ACM Conference on Recommender Systems, 2014
Published: 2021

23. Session-based k-NNs with Semantic Suggestions for Next-item Prediction

Author: Rac, Miroslav, Kompan, Michal, and Bielikova, Maria
Subjects: Computer Science - Information Retrieval
Abstract: One of the most critical problems in e-commerce domain is the information overload problem. Usually, an enormous number of products is offered to a user. The characteristics of this domain force researchers to opt for session-based recommendation methods, from which nearest-neighbors-based (SkNN) approaches have been shown to be competitive with and even outperform neural network-based models. Existing SkNN approaches, however, lack the ability to detect sudden interest changes at a micro-level, i.e., during an individual session; and to adapt their recommendations to these changes. In this paper, we propose a conceptual (cSkNN) model extension for the next-item prediction allowing better adaptation to the interest changes via the semantic-level properties. We use an NLP technique to parse salient concepts from the product titles to create linguistically based product generalizations that are used for change detection and a recommendation list post-filtering. We conducted experiments with two versions of our extension that differ in semantics derivation procedure while both showing an improvement over the existing SkNN method on a sparse fashion e-commerce dataset., Comment: 11 pages, 3 figures, 3 tables, submitted to and presented at RecSys20 CARS workshop
Published: 2020

24. Considering temporal aspects in recommender systems: a survey

Author: Bogina, Veronika, Kuflik, Tsvi, Jannach, Dietmar, Bielikova, Maria, Kompan, Michal, and Trattner, Christoph
Published: 2023
Full Text: View/download PDF

25. NL-FIIT at SemEval-2019 Task 9: Neural Model Ensemble for Suggestion Mining

Author: Pecar, Samuel, Simko, Marian, and Bielikova, Maria
Subjects: Computer Science - Computation and Language
Abstract: In this paper, we present neural model architecture submitted to the SemEval-2019 Task 9 competition: "Suggestion Mining from Online Reviews and Forums". We participated in both subtasks for domain specific and also cross-domain suggestion mining. We proposed a recurrent neural network architecture that employs Bi-LSTM layers and also self-attention mechanism. Our architecture tries to encode words via word representations using ELMo and ensembles multiple models to achieve better results. We performed experiments with different setups of our proposed model involving weighting of prediction classes for loss function. Our best model achieved in official test evaluation score of 0.6816 for subtask A and 0.6850 for subtask B. In official results, we achieved 12th and 10th place in subtasks A and B, respectively., Comment: Accepted at the SemEval-2019 International Workshop on Semantic Evaluation
Published: 2019

26. Quantitative and qualitative analysis of linking patterns of mainstream and partisan online news media in Central Europe

Author: Hrckova, Andrea, Moro, Robert, Srba, Ivan, and Bielikova, Maria
Published: 2022
Full Text: View/download PDF

27. Reconsidering the regulation of facial recognition in public spaces

Author: Solarova, Sara, Podroužek, Juraj, Mesarčík, Matúš, Gavornik, Adrian, and Bielikova, Maria
Published: 2022
Full Text: View/download PDF

28. Improving Moderation of Online Discussions via Interpretable Neural Models

Author: Švec, Andrej, Pikuliak, Matúš, Šimko, Marián, and Bieliková, Mária
Subjects: Computer Science - Computation and Language
Abstract: Growing amount of comments make online discussions difficult to moderate by human moderators only. Antisocial behavior is a common occurrence that often discourages other users from participating in discussion. We propose a neural network based method that partially automates the moderation process. It consists of two steps. First, we detect inappropriate comments for moderators to see. Second, we highlight inappropriate parts within these comments to make the moderation faster. We evaluated our method on data from a major Slovak news discussion platform., Comment: ALW2
Published: 2018

29. Databases and Information Systems in the AI Era: Contributions from ADBIS, TPDL and EDA 2020 Workshops and Doctoral Consortium

Author: Bellatreche, Ladjel, Bentayeb, Fadila, Bieliková, Mária, Boussaid, Omar, Catania, Barbara, Ceravolo, Paolo, Demidova, Elena, Halfeld Ferrari, Mirian, Lopez, Maria Teresa Gomez, Hara, Carmem S., Kordić, Slavica, Luković, Ivan, Mannocci, Andrea, Manghi, Paolo, Osborne, Francesco, Papatheodorou, Christos, Ristić, Sonja, Sacharidis, Dimitris, Romero, Oscar, Salatino, Angelo A., Talens, Guilaine, van Keulen, Maurice, Vergoulis, Thanasis, Zumer, Maja, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Kotenko, Igor, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Bellatreche, Ladjel, editor, Bieliková, Mária, editor, Boussaïd, Omar, editor, Catania, Barbara, editor, Darmont, Jérôme, editor, Demidova, Elena, editor, Duchateau, Fabien, editor, Hall, Mark, editor, Merčun, Tanja, editor, Novikov, Boris, editor, Papatheodorou, Christos, editor, Risse, Thomas, editor, Romero, Oscar, editor, Sautot, Lucile, editor, Talens, Guilaine, editor, Wrembel, Robert, editor, and Žumer, Maja, editor
Published: 2020
Full Text: View/download PDF

30. Scalable Real-Time Confusion Detection for Personalized Onboarding Guides

Author: Hucko, Michal, Moro, Robert, Bielikova, Maria, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Bielikova, Maria, editor, Mikkonen, Tommi, editor, and Pautasso, Cesare, editor
Published: 2020
Full Text: View/download PDF

31. Pupil size variation in primary facial expressions–testing potential biomarker of self-criticism

Author: Kanovský, Martin, Halamová, Júlia, Strnádelová, Bronislava, Moro, Robert, and Bielikova, Maria
Published: 2022
Full Text: View/download PDF

32. Addressing False Information and Abusive Language in Digital Space Using Intelligent Approaches

Author: Machova, Kristina, Srba, Ivan, Sarnovský, Martin, Paralič, Ján, Kresnakova, Viera Maslej, Hrckova, Andrea, Kompan, Michal, Simko, Marian, Blaho, Radoslav, Chuda, Daniela, Bielikova, Maria, Navrat, Pavol, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Paralič, Ján, editor, Sinčák, Peter, editor, Hartono, Pitoyo, editor, and Mařík, Vladimír, editor
Published: 2021
Full Text: View/download PDF

33. FireAnt: Claim-Based Medical Misinformation Detection and Monitoring

Author: Pecher, Branislav, Srba, Ivan, Moro, Robert, Tomlein, Matus, Bielikova, Maria, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Dong, Yuxiao, editor, Ifrim, Georgiana, editor, Mladenić, Dunja, editor, Saunders, Craig, editor, and Van Hoecke, Sofie, editor
Published: 2021
Full Text: View/download PDF

34. Examining the Linking Patterns and Link Building Strategies of Mainstream and Partisan Online News Media in Central Europe

Author: Hrckova, Andrea, Moro, Robert, Srba, Ivan, Bielikova, Maria, Akan, Ozgur, Editorial Board Member, Bellavista, Paolo, Editorial Board Member, Cao, Jiannong, Editorial Board Member, Coulson, Geoffrey, Editorial Board Member, Dressler, Falko, Editorial Board Member, Ferrari, Domenico, Editorial Board Member, Gerla, Mario, Editorial Board Member, Kobayashi, Hisashi, Editorial Board Member, Palazzo, Sergio, Editorial Board Member, Sahni, Sartaj, Editorial Board Member, Shen, Xuemin (Sherman), Editorial Board Member, Stan, Mircea, Editorial Board Member, Jia, Xiaohua, Editorial Board Member, Zomaya, Albert Y., Editorial Board Member, and Mugnaini, Rogério, editor
Published: 2020
Full Text: View/download PDF

35. Cross-lingual learning for text processing: A survey

Author: Pikuliak, Matúš, Šimko, Marián, and Bieliková, Mária
Published: 2021
Full Text: View/download PDF

36. Web-Navigation Skill Assessment Through Eye-Tracking Data

Author: Hlavac, Patrik, Simko, Jakub, Bielikova, Maria, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Welzer, Tatjana, editor, Eder, Johann, editor, Podgorelec, Vili, editor, and Kamišalić Latifić, Aida, editor
Published: 2019
Full Text: View/download PDF

37. Towards Combining Multitask and Multilingual Learning

Author: Pikuliak, Matus, Simko, Marian, Bielikova, Maria, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Catania, Barbara, editor, Královič, Rastislav, editor, Nawrocki, Jerzy, editor, and Pighizzini, Giovanni, editor
Published: 2019
Full Text: View/download PDF

38. Navigation leads for exploratory search and navigation in digital libraries

Author: Moro, Robert and Bielikova, Maria
Published: 2020
Full Text: View/download PDF

39. Automatic Concept Relationships Discovery for an Adaptive E-Course

Author: International Working Group on Educational Data Mining, Simko, Marian, and Bielikova, Maria
Abstract: To make learning process more effective, the educational systems deliver content adapted to specific user needs. Adequate personalization requires the domain of learning to be described explicitly in a particular detail, involving relationships between knowledge elements referred to as concepts. Manual creation of necessary annotations is in the case of larger courses a demanding task. In this paper we tackle a concept relationship discovery problem that is a step in adaptive e-course authoring process. We propose a method of automatic concept relationship discovery for an adaptive e-course. We present two approaches based on domain model graph analysis. We evaluate our method in the domain of programming. (Contains 2 figures and 1 table.) [Funding was provided by the Cultural and Educational Grant Agency of the Slovak Republic. For the complete proceedings, "Proceedings of the International Conference on Educational Data Mining (EDM) (2nd, Cordoba, Spain, July 1-3, 2009)," see ED539041.]
Published: 2009

40. Towards Detection of Usability Issues by Measuring Emotions

Author: Stefancova, Elena, Moro, Robert, Bielikova, Maria, Barbosa, Simone Diniz Junqueira, Series Editor, Filipe, Joaquim, Series Editor, Kotenko, Igor, Series Editor, Sivalingam, Krishna M., Series Editor, Washio, Takashi, Series Editor, Yuan, Junsong, Series Editor, Zhou, Lizhu, Series Editor, Benczúr, András, editor, Thalheim, Bernhard, editor, Horváth, Tomáš, editor, Chiusano, Silvia, editor, Cerquitelli, Tania, editor, Sidló, Csaba, editor, and Revesz, Peter Z., editor
Published: 2018
Full Text: View/download PDF

41. FireAnt: Claim-Based Medical Misinformation Detection and Monitoring

Author: Pecher, Branislav, primary, Srba, Ivan, additional, Moro, Robert, additional, Tomlein, Matus, additional, and Bielikova, Maria, additional
Published: 2021
Full Text: View/download PDF

42. Employing community question answering for online discussions in university courses: Students’ perspective

Author: Srba, Ivan, Savic, Milos, Bielikova, Maria, Ivanovic, Mirjana, and Pautasso, Cesare
Published: 2019
Full Text: View/download PDF

43. Searching for discriminative words in multidimensional continuous feature space

Author: Sajgalik, Marius, Barla, Michal, and Bielikova, Maria
Published: 2019
Full Text: View/download PDF

44. Addressing False Information and Abusive Language in Digital Space Using Intelligent Approaches

Author: Machova, Kristina, primary, Srba, Ivan, additional, Sarnovský, Martin, additional, Paralič, Ján, additional, Kresnakova, Viera Maslej, additional, Hrckova, Andrea, additional, Kompan, Michal, additional, Simko, Marian, additional, Blaho, Radoslav, additional, Chuda, Daniela, additional, Bielikova, Maria, additional, and Navrat, Pavol, additional
Published: 2020
Full Text: View/download PDF