Author: "Yilmaz, Emine" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Yilmaz, Emine"' showing total 1,230 results

Start Over Author "Yilmaz, Emine"

1,230 results on '"Yilmaz, Emine"'

1. SynDL: A Large-Scale Synthetic Test Collection for Passage Retrieval

Author: Rahmani, Hossein A., Wang, Xi, Yilmaz, Emine, Craswell, Nick, Mitra, Bhaskar, and Thomas, Paul
Subjects: Computer Science - Information Retrieval
Abstract: Large-scale test collections play a crucial role in Information Retrieval (IR) research. However, according to the Cranfield paradigm and the research into publicly available datasets, the existing information retrieval research studies are commonly developed on small-scale datasets that rely on human assessors for relevance judgments - a time-intensive and expensive process. Recent studies have shown the strong capability of Large Language Models (LLMs) in producing reliable relevance judgments with human accuracy but at a greatly reduced cost. In this paper, to address the missing large-scale ad-hoc document retrieval dataset, we extend the TREC Deep Learning Track (DL) test collection via additional language model synthetic labels to enable researchers to test and evaluate their search systems at a large scale. Specifically, such a test collection includes more than 1,900 test queries from the previous years of tracks. We compare system evaluation with past human labels from past years and find that our synthetically created large-scale test collection can lead to highly correlated system rankings., Comment: 9 pages, resource paper
Published: 2024

2. LLMJudge: LLMs for Relevance Judgments

Author: Rahmani, Hossein A., Yilmaz, Emine, Craswell, Nick, Mitra, Bhaskar, Thomas, Paul, Clarke, Charles L. A., Aliannejadi, Mohammad, Siro, Clemencia, and Faggioli, Guglielmo
Subjects: Computer Science - Information Retrieval
Abstract: The LLMJudge challenge is organized as part of the LLM4Eval workshop at SIGIR 2024. Test collections are essential for evaluating information retrieval (IR) systems. The evaluation and tuning of a search system is largely based on relevance labels, which indicate whether a document is useful for a specific search and user. However, collecting relevance judgments on a large scale is costly and resource-intensive. Consequently, typical experiments rely on third-party labelers who may not always produce accurate annotations. The LLMJudge challenge aims to explore an alternative approach by using LLMs to generate relevance judgments. Recent studies have shown that LLMs can generate reliable relevance judgments for search systems. However, it remains unclear which LLMs can match the accuracy of human labelers, which prompts are most effective, how fine-tuned open-source LLMs compare to closed-source LLMs like GPT-4, whether there are biases in synthetically generated data, and if data leakage affects the quality of generated labels. This challenge will investigate these questions, and the collected data will be released as a package to support automatic relevance judgment research in information retrieval and search., Comment: LLMJudge Challenge Overview, 3 pages
Published: 2024

3. Report on the 1st Workshop on Large Language Model for Evaluation in Information Retrieval (LLM4Eval 2024) at SIGIR 2024

Author: Rahmani, Hossein A., Siro, Clemencia, Aliannejadi, Mohammad, Craswell, Nick, Clarke, Charles L. A., Faggioli, Guglielmo, Mitra, Bhaskar, Thomas, Paul, and Yilmaz, Emine
Subjects: Computer Science - Information Retrieval
Abstract: The first edition of the workshop on Large Language Model for Evaluation in Information Retrieval (LLM4Eval 2024) took place in July 2024, co-located with the ACM SIGIR Conference 2024 in the USA (SIGIR 2024). The aim was to bring information retrieval researchers together around the topic of LLMs for evaluation in information retrieval that gathered attention with the advancement of large language models and generative AI. Given the novelty of the topic, the workshop was focused around multi-sided discussions, namely panels and poster sessions of the accepted proceedings papers., Comment: LLM4Eval Workshop Report
Published: 2024

4. Adaptive Retrieval-Augmented Generation for Conversational Systems

Author: Wang, Xi, Sen, Procheta, Li, Ruizhe, and Yilmaz, Emine
Subjects: Computer Science - Computation and Language, Computer Science - Information Retrieval
Abstract: Despite the success of integrating large language models into the development of conversational systems, many studies have shown the effectiveness of retrieving and augmenting external knowledge for informative responses. Hence, many existing studies commonly assume the always need for Retrieval Augmented Generation (RAG) in a conversational system without explicit control. This raises a research question about such a necessity. In this study, we propose to investigate the need for each turn of system response to be augmented with external knowledge. In particular, by leveraging human judgements on the binary choice of adaptive augmentation, we develop RAGate, a gating model, which models conversation context and relevant inputs to predict if a conversational system requires RAG for improved responses. We conduct extensive experiments on devising and applying RAGate to conversational models and well-rounded analyses of different conversational scenarios. Our experimental results and analysis indicate the effective application of RAGate in RAG-based conversational systems in identifying system responses for appropriate RAG with high-quality responses and a high generation confidence. This study also identifies the correlation between the generation's confidence level and the relevance of the augmented knowledge., Comment: 12 pages, under review
Published: 2024

5. Understanding the Role of User Profile in the Personalization of Large Language Models

Author: Wu, Bin, Shi, Zhengyan, Rahmani, Hossein A., Ramineni, Varsha, and Yilmaz, Emine
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Information Retrieval
Abstract: Utilizing user profiles to personalize Large Language Models (LLMs) has been shown to enhance the performance on a wide range of tasks. However, the precise role of user profiles and their effect mechanism on LLMs remains unclear. This study first confirms that the effectiveness of user profiles is primarily due to personalization information rather than semantic information. Furthermore, we investigate how user profiles affect the personalization of LLMs. Within the user profile, we reveal that it is the historical personalized response produced or approved by users that plays a pivotal role in personalizing LLMs. This discovery unlocks the potential of LLMs to incorporate a greater number of user profiles within the constraints of limited input length. As for the position of user profiles, we observe that user profiles integrated into different positions of the input context do not contribute equally to personalization. Instead, where the user profile that is closer to the beginning affects more on the personalization of LLMs. Our findings reveal the role of user profiles for the personalization of LLMs, and showcase how incorporating user profiles impacts performance providing insight to leverage user profiles effectively.
Published: 2024

6. Instruction Tuning With Loss Over Instructions

Author: Shi, Zhengyan, Yang, Adam X., Wu, Bin, Aitchison, Laurence, Yilmaz, Emine, and Lipani, Aldo
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Instruction tuning plays a crucial role in shaping the outputs of language models (LMs) to desired styles. In this work, we propose a simple yet effective method, Instruction Modelling (IM), which trains LMs by applying a loss function to the instruction and prompt part rather than solely to the output part. Through experiments across 21 diverse benchmarks, we show that, in many scenarios, IM can effectively improve the LM performance on both NLP tasks (e.g., MMLU, TruthfulQA, and HumanEval) and open-ended generation benchmarks (e.g., MT-Bench and AlpacaEval). Remarkably, in the most advantageous case, IM boosts model performance on AlpacaEval 1.0 by over 100%. We identify two key factors influencing the effectiveness of IM: (1) The ratio between instruction length and output length in the training data; and (2) The number of training examples. We observe that IM is especially beneficial when trained on datasets with lengthy instructions paired with brief outputs, or under the Superficial Alignment Hypothesis (SAH) where a small amount of training examples are used for instruction tuning. Further analysis substantiates our hypothesis that our improvement can be attributed to reduced overfitting to instruction tuning datasets. It is worth noting that we are not proposing \ours as a replacement for current fine-tuning processes. Instead, our work aims to provide practical guidance for instruction tuning LMs, especially in low-resource scenarios., Comment: NeurIPS 2024. Code is available at https://github.com/ZhengxiangShi/InstructionModelling
Published: 2024

7. Synthetic Test Collections for Retrieval Evaluation

Author: Rahmani, Hossein A., Craswell, Nick, Yilmaz, Emine, Mitra, Bhaskar, and Campos, Daniel
Subjects: Computer Science - Information Retrieval, Computer Science - Artificial Intelligence
Abstract: Test collections play a vital role in evaluation of information retrieval (IR) systems. Obtaining a diverse set of user queries for test collection construction can be challenging, and acquiring relevance judgments, which indicate the appropriateness of retrieved documents to a query, is often costly and resource-intensive. Generating synthetic datasets using Large Language Models (LLMs) has recently gained significant attention in various applications. In IR, while previous work exploited the capabilities of LLMs to generate synthetic queries or documents to augment training data and improve the performance of ranking models, using LLMs for constructing synthetic test collections is relatively unexplored. Previous studies demonstrate that LLMs have the potential to generate synthetic relevance judgments for use in the evaluation of IR systems. In this paper, we comprehensively investigate whether it is possible to use LLMs to construct fully synthetic test collections by generating not only synthetic judgments but also synthetic queries. In particular, we analyse whether it is possible to construct reliable synthetic test collections and the potential risks of bias such test collections may exhibit towards LLM-based models. Our experiments indicate that using LLMs it is possible to construct synthetic test collections that can reliably be used for retrieval evaluation., Comment: SIGIR 2024
Published: 2024

8. Clarifying the Path to User Satisfaction: An Investigation into Clarification Usefulness

Author: Rahmani, Hossein A., Wang, Xi, Aliannejadi, Mohammad, Naghiaei, Mohammadmehdi, and Yilmaz, Emine
Subjects: Computer Science - Information Retrieval
Abstract: Clarifying questions are an integral component of modern information retrieval systems, directly impacting user satisfaction and overall system performance. Poorly formulated questions can lead to user frustration and confusion, negatively affecting the system's performance. This research addresses the urgent need to identify and leverage key features that contribute to the classification of clarifying questions, enhancing user satisfaction. To gain deeper insights into how different features influence user satisfaction, we conduct a comprehensive analysis, considering a broad spectrum of lexical, semantic, and statistical features, such as question length and sentiment polarity. Our empirical results provide three main insights into the qualities of effective query clarification: (1) specific questions are more effective than generic ones; (2) the subjectivity and emotional tone of a question play a role; and (3) shorter and more ambiguous queries benefit significantly from clarification. Based on these insights, we implement feature-integrated user satisfaction prediction using various classifiers, both traditional and neural-based, including random forest, BERT, and large language models. Our experiments show a consistent and significant improvement, particularly in traditional classifiers, with a minimum performance boost of 45\%. This study presents invaluable guidelines for refining the formulation of clarifying questions and enhancing both user satisfaction and system performance., Comment: EACL
Published: 2024

9. Benchmarking LLMs via Uncertainty Quantification

Author: Ye, Fanghua, Yang, Mingming, Pang, Jianhui, Wang, Longyue, Wong, Derek F., Yilmaz, Emine, Shi, Shuming, and Tu, Zhaopeng
Subjects: Computer Science - Computation and Language
Abstract: The proliferation of open-source Large Language Models (LLMs) from various institutions has highlighted the urgent need for comprehensive evaluation methods. However, current evaluation platforms, such as the widely recognized HuggingFace open LLM leaderboard, neglect a crucial aspect -- uncertainty, which is vital for thoroughly assessing LLMs. To bridge this gap, we introduce a new benchmarking approach for LLMs that integrates uncertainty quantification. Our examination involves nine LLMs (LLM series) spanning five representative natural language processing tasks. Our findings reveal that: I) LLMs with higher accuracy may exhibit lower certainty; II) Larger-scale LLMs may display greater uncertainty compared to their smaller counterparts; and III) Instruction-finetuning tends to increase the uncertainty of LLMs. These results underscore the significance of incorporating uncertainty in the evaluation of LLMs., Comment: 30 pages, accepted to NeurIPS 2024
Published: 2024

10. A Toolbox for Modelling Engagement with Educational Videos

Author: Qiu, Yuxiang, Djemili, Karim, Elezi, Denis, Shalman, Aaneel, Pérez-Ortiz, María, Yilmaz, Emine, Shawe-Taylor, John, and Bulathwela, Sahan
Subjects: Computer Science - Computers and Society, Computer Science - Information Retrieval, Computer Science - Machine Learning, Statistics - Applications, H.3.3, J.1, I.2.0
Abstract: With the advancement and utility of Artificial Intelligence (AI), personalising education to a global population could be a cornerstone of new educational systems in the future. This work presents the PEEKC dataset and the TrueLearn Python library, which contains a dataset and a series of online learner state models that are essential to facilitate research on learner engagement modelling.TrueLearn family of models was designed following the "open learner" concept, using humanly-intuitive user representations. This family of scalable, online models also help end-users visualise the learner models, which may in the future facilitate user interaction with their models/recommenders. The extensive documentation and coding examples make the library highly accessible to both machine learning developers and educational data mining and learning analytics practitioners. The experiments show the utility of both the dataset and the library with predictive performance significantly exceeding comparative baseline models. The dataset contains a large amount of AI-related educational videos, which are of interest for building and validating AI-specific educational recommenders., Comment: In Proceedings of AAAI Conference on Artificial Intelligence 2024. arXiv admin note: text overlap with arXiv:2309.11527
Published: 2023

11. A Challenging Interventional Procedure: Transcatheter Closure of Tubular Patent Ductus Arteriosus in Patients with Pulmonary Hypertension

Author: Yucel, Ilker Kemal, Epcacan, Serdar, Bulut, Mustafa Orhan, Demir, Ibrahim Halil, Surucu, Murat, Yilmaz, Emine Hekim, Kardas, Murat, Kanlioglu, Pinar, and Celebi, Ahmet
Published: 2024
Full Text: View/download PDF

12. Improving Conversational Recommendation Systems via Bias Analysis and Language-Model-Enhanced Data Augmentation

Author: Wang, Xi, Rahmani, Hossein A., Liu, Jiqun, and Yilmaz, Emine
Subjects: Computer Science - Computation and Language, Computer Science - Information Retrieval
Abstract: Conversational Recommendation System (CRS) is a rapidly growing research area that has gained significant attention alongside advancements in language modelling techniques. However, the current state of conversational recommendation faces numerous challenges due to its relative novelty and limited existing contributions. In this study, we delve into benchmark datasets for developing CRS models and address potential biases arising from the feedback loop inherent in multi-turn interactions, including selection bias and multiple popularity bias variants. Drawing inspiration from the success of generative data via using language models and data augmentation techniques, we present two novel strategies, 'Once-Aug' and 'PopNudge', to enhance model performance while mitigating biases. Through extensive experiments on ReDial and TG-ReDial benchmark datasets, we show a consistent improvement of CRS techniques with our data augmentation approaches and offer additional insights on addressing multiple newly formulated biases., Comment: Accepted by EMNLP 2023 (Findings)
Published: 2023

13. Enhancing Conversational Search: Large Language Model-Aided Informative Query Rewriting

Author: Ye, Fanghua, Fang, Meng, Li, Shenghui, and Yilmaz, Emine
Subjects: Computer Science - Human-Computer Interaction, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Information Retrieval
Abstract: Query rewriting plays a vital role in enhancing conversational search by transforming context-dependent user queries into standalone forms. Existing approaches primarily leverage human-rewritten queries as labels to train query rewriting models. However, human rewrites may lack sufficient information for optimal retrieval performance. To overcome this limitation, we propose utilizing large language models (LLMs) as query rewriters, enabling the generation of informative query rewrites through well-designed instructions. We define four essential properties for well-formed rewrites and incorporate all of them into the instruction. In addition, we introduce the role of rewrite editors for LLMs when initial query rewrites are available, forming a "rewrite-then-edit" process. Furthermore, we propose distilling the rewriting capabilities of LLMs into smaller models to reduce rewriting latency. Our experimental evaluation on the QReCC dataset demonstrates that informative query rewrites can yield substantially improved retrieval performance compared to human rewrites, especially with sparse retrievers., Comment: 22 pages, accepted to EMNLP Findings 2023
Published: 2023

14. Utility of Balloon Occlusion Testing in Determining Fontan Suitability Among Patients with Elevated Pulmonary Artery Pressure and Additional Antegrade Pulmonary Blood Flow

Author: Demir, Ibrahim Halil, Celebi, Ahmet, Ozdemir, Dursun Muhammed, Yilmaz, Emine Hekim, Bulut, Mustafa Orhan, Surucu, Murat, Korun, Oktay, Aydemir, Numan Ali, and Yucel, Ilker Kemal
Published: 2024
Full Text: View/download PDF

15. Schema-Guided User Satisfaction Modeling for Task-Oriented Dialogues

Author: Feng, Yue, Jiao, Yunlong, Prasad, Animesh, Aletras, Nikolaos, Yilmaz, Emine, and Kazai, Gabriella
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: User Satisfaction Modeling (USM) is one of the popular choices for task-oriented dialogue systems evaluation, where user satisfaction typically depends on whether the user's task goals were fulfilled by the system. Task-oriented dialogue systems use task schema, which is a set of task attributes, to encode the user's task goals. Existing studies on USM neglect explicitly modeling the user's task goals fulfillment using the task schema. In this paper, we propose SG-USM, a novel schema-guided user satisfaction modeling framework. It explicitly models the degree to which the user's preferences regarding the task attributes are fulfilled by the system for predicting the user's satisfaction level. SG-USM employs a pre-trained language model for encoding dialogue context and task attributes. Further, it employs a fulfillment representation layer for learning how many task attributes have been fulfilled in the dialogue, an importance predictor component for calculating the importance of task attributes. Finally, it predicts the user satisfaction based on task attribute fulfillment and task attribute importance. Experimental results on benchmark datasets (i.e. MWOZ, SGD, ReDial, and JDDC) show that SG-USM consistently outperforms competitive existing methods. Our extensive analysis demonstrates that SG-USM can improve the interpretability of user satisfaction modeling, has good scalability as it can effectively deal with unseen tasks and can also effectively work in low-resource settings by leveraging unlabeled data.
Published: 2023

16. A Survey on Asking Clarification Questions Datasets in Conversational Systems

Author: Rahmani, Hossein A., Wang, Xi, Feng, Yue, Zhang, Qiang, Yilmaz, Emine, and Lipani, Aldo
Subjects: Computer Science - Information Retrieval
Abstract: The ability to understand a user's underlying needs is critical for conversational systems, especially with limited input from users in a conversation. Thus, in such a domain, Asking Clarification Questions (ACQs) to reveal users' true intent from their queries or utterances arise as an essential task. However, it is noticeable that a key limitation of the existing ACQs studies is their incomparability, from inconsistent use of data, distinct experimental setups and evaluation strategies. Therefore, in this paper, to assist the development of ACQs techniques, we comprehensively analyse the current ACQs research status, which offers a detailed comparison of publicly available datasets, and discusses the applied evaluation metrics, joined with benchmarks for multiple ACQs-related tasks. In particular, given a thorough analysis of the ACQs task, we discuss a number of corresponding research directions for the investigation of ACQs as well as the development of conversational systems., Comment: ACL 2023, 17 pages
Published: 2023

17. Towards Asking Clarification Questions for Information Seeking on Task-Oriented Dialogues

Author: Feng, Yue, Rahmani, Hossein A., Lipani, Aldo, and Yilmaz, Emine
Subjects: Computer Science - Computation and Language, Computer Science - Information Retrieval
Abstract: Task-oriented dialogue systems aim at providing users with task-specific services. Users of such systems often do not know all the information about the task they are trying to accomplish, requiring them to seek information about the task. To provide accurate and personalized task-oriented information seeking results, task-oriented dialogue systems need to address two potential issues: 1) users' inability to describe their complex information needs in their requests; and 2) ambiguous/missing information the system has about the users. In this paper, we propose a new Multi-Attention Seq2Seq Network, named MAS2S, which can ask questions to clarify the user's information needs and the user's profile in task-oriented information seeking. We also extend an existing dataset for task-oriented information seeking, leading to the \ourdataset which contains about 100k task-oriented information seeking dialogues that are made publicly available\footnote{Dataset and code is available at \href{https://github.com/sweetalyssum/clarit}{https://github.com/sweetalyssum/clarit}.}. Experimental results on \ourdataset show that MAS2S outperforms baselines on both clarification question generation and answer prediction.
Published: 2023

18. Rethinking Semi-supervised Learning with Language Models

Author: Shi, Zhengxiang, Tonolini, Francesco, Aletras, Nikolaos, Yilmaz, Emine, Kazai, Gabriella, and Jiao, Yunlong
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Semi-supervised learning (SSL) is a popular setting aiming to effectively utilize unlabelled data to improve model performance in downstream natural language processing (NLP) tasks. Currently, there are two popular approaches to make use of unlabelled data: Self-training (ST) and Task-adaptive pre-training (TAPT). ST uses a teacher model to assign pseudo-labels to the unlabelled data, while TAPT continues pre-training on the unlabelled data before fine-tuning. To the best of our knowledge, the effectiveness of TAPT in SSL tasks has not been systematically studied, and no previous work has directly compared TAPT and ST in terms of their ability to utilize the pool of unlabelled data. In this paper, we provide an extensive empirical study comparing five state-of-the-art ST approaches and TAPT across various NLP tasks and data sizes, including in- and out-of-domain settings. Surprisingly, we find that TAPT is a strong and more robust SSL learner, even when using just a few hundred unlabelled samples or in the presence of domain shifts, compared to more sophisticated ST approaches, and tends to bring greater improvements in SSL than in fully-supervised settings. Our further analysis demonstrates the risks of using ST approaches when the size of labelled or unlabelled data is small or when domain shifts exist. We offer a fresh perspective for future SSL research, suggesting the use of unsupervised pre-training objectives over dependency on pseudo labels., Comment: Findings of ACL 2023. Code is available at https://github.com/amzn/pretraining-or-self-training
Published: 2023

19. Modeling User Satisfaction Dynamics in Dialogue via Hawkes Process

Author: Ye, Fanghua, Hu, Zhiyuan, and Yilmaz, Emine
Subjects: Computer Science - Computation and Language
Abstract: Dialogue systems have received increasing attention while automatically evaluating their performance remains challenging. User satisfaction estimation (USE) has been proposed as an alternative. It assumes that the performance of a dialogue system can be measured by user satisfaction and uses an estimator to simulate users. The effectiveness of USE depends heavily on the estimator. Existing estimators independently predict user satisfaction at each turn and ignore satisfaction dynamics across turns within a dialogue. In order to fully simulate users, it is crucial to take satisfaction dynamics into account. To fill this gap, we propose a new estimator ASAP (sAtisfaction eStimation via HAwkes Process) that treats user satisfaction across turns as an event sequence and employs a Hawkes process to effectively model the dynamics in this sequence. Experimental results on four benchmark dialogue datasets demonstrate that ASAP can substantially outperform state-of-the-art baseline estimators., Comment: To appear at ACL 2023
Published: 2023

20. Scalable Educational Question Generation with Pre-trained Language Models

Author: Bulathwela, Sahan, Muse, Hamze, and Yilmaz, Emine
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computers and Society, Computer Science - Information Retrieval, Computer Science - Machine Learning, H.3.3, J.1, I.2.0
Abstract: The automatic generation of educational questions will play a key role in scaling online education, enabling self-assessment at scale when a global population is manoeuvring their personalised learning journeys. We develop \textit{EduQG}, a novel educational question generation model built by adapting a large language model. Our extensive experiments demonstrate that \textit{EduQG} can produce superior educational questions by further pre-training and fine-tuning a pre-trained language model on the scientific text and science question data., Comment: To be published at the Int. Conf. on Artificial Intelligence in Education (Tokyo, 2023)
Published: 2023

21. Query-specific Variable Depth Pooling via Query Performance Prediction towards Reducing Relevance Assessment Effort

Author: Ganguly, Debasis and Yilmaz, Emine
Subjects: Computer Science - Information Retrieval
Abstract: Due to the massive size of test collections, a standard practice in IR evaluation is to construct a 'pool' of candidate relevant documents comprised of the top-k documents retrieved by a wide range of different retrieval systems - a process called depth-k pooling. A standard practice is to set the depth (k) to a constant value for each query constituting the benchmark set. However, in this paper we argue that the annotation effort can be substantially reduced if the depth of the pool is made a variable quantity for each query, the rationale being that the number of documents relevant to the information need can widely vary across queries. Our hypothesis is that a lower depth for the former class of queries and a higher depth for the latter can potentially reduce the annotation effort without a significant change in retrieval effectiveness evaluation. We make use of standard query performance prediction (QPP) techniques to estimate the number of potentially relevant documents for each query, which is then used to determine the depth of the pool. Our experiments conducted on standard test collections demonstrate that this proposed method of employing query-specific variable depths is able to adequately reflect the relative effectiveness of IR systems with a substantially smaller annotation effort., Comment: To appear in SIGIR 2023
Published: 2023

22. Can Population-Based Engagement Improve Personalisation? A Novel Dataset and Experiments

Author: Bulathwela, Sahan, Verma, Meghana, Pérez-Ortiz, María, Yilmaz, Emine, and Shawe-Taylor, John
Abstract: This work explores how population-based engagement prediction can address cold-start at scale in large learning resource collections. The paper introduces: (1) VLE, a novel dataset that consists of content and video based features extracted from publicly available scientific video lectures coupled with implicit and explicit signals related to learner engagement; (2) two standard tasks related to predicting and ranking context-agnostic engagement in video lectures with preliminary baselines; and (3) a set of experiments that validate the usefulness of the proposed dataset. Our experimental results indicate that the newly proposed VLE dataset leads to building context-agnostic engagement prediction models that are significantly performant than ones based on previous datasets, mainly attributing to the increase of training examples. VLE dataset's suitability in building models towards Computer Science/ Artificial Intelligence education focused on e-learning/MOOC use-cases is also evidenced. Further experiments in combining the built model with a personalising algorithm show promising improvements in addressing the cold-start problem encountered in educational recommenders. This is the largest and most diverse publicly available dataset to our knowledge that deals with learner engagement prediction tasks. The dataset, helper tools, descriptive statistics and example code snippets are available publicly. [For the full proceedings, see ED623995.]
Published: 2022

23. Task2KB: A Public Task-Oriented Knowledge Base

Author: Sen, Procheta, Wang, Xi, Xu, Ruiqing, and Yilmaz, Emine
Subjects: Computer Science - Information Retrieval
Abstract: Search engines and conversational assistants are commonly used to help users complete their every day tasks such as booking travel, cooking, etc. While there are some existing datasets that can be used for this purpose, their coverage is limited to very few domains. In this paper, we propose a novel knowledge base, 'Task2KB', which is constructed using data crawled from WikiHow, an online knowledge resource offering instructional articles on a wide range of tasks. Task2KB encapsulates various types of task-related information and attributes, such as requirements, detailed step description, and available methods to complete tasks. Due to its higher coverage compared to existing related knowledge graphs, Task2KB can be highly useful in the development of general purpose task completion assistants
Published: 2023

24. Pre-Training With Scientific Text Improves Educational Question Generation

Author: Muse, Hamze, Bulathwela, Sahan, and Yilmaz, Emine
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Computers and Society, Computer Science - Information Retrieval, Computer Science - Machine Learning, Statistics - Machine Learning, H.3.3, J.1, I.2.0
Abstract: With the boom of digital educational materials and scalable e-learning systems, the potential for realising AI-assisted personalised learning has skyrocketed. In this landscape, the automatic generation of educational questions will play a key role, enabling scalable self-assessment when a global population is manoeuvring their personalised learning journeys. We develop EduQG, a novel educational question generation model built by adapting a large language model. Our initial experiments demonstrate that EduQG can produce superior educational questions by pre-training on scientific text., Comment: In Proceedings of AAAI Conference on Artificial Intelligence 2023
Published: 2022

25. Towards Human-Like Educational Question Generation with Small Language Models

Author: Fawzi, Fares, Balan, Sarang, Cukurova, Mutlu, Yilmaz, Emine, Bulathwela, Sahan, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Olney, Andrew M., editor, Chounta, Irene-Angelica, editor, Liu, Zitao, editor, Santos, Olga C., editor, and Bittencourt, Ig Ibert, editor
Published: 2024
Full Text: View/download PDF

26. KEIR @ ECIR 2024: The First Workshop on Knowledge-Enhanced Information Retrieval

Author: Meng, Zaiqiao, Liang, Shangsong, Xin, Xin, Moro, Gianluca, Kanoulas, Evangelos, Yilmaz, Emine, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Goharian, Nazli, editor, Tonellotto, Nicola, editor, He, Yulan, editor, Lipani, Aldo, editor, McDonald, Graham, editor, Macdonald, Craig, editor, and Ounis, Iadh, editor
Published: 2024
Full Text: View/download PDF

27. Simulated Task Oriented Dialogues for Developing Versatile Conversational Agents

Author: Wang, Xi, Sen, Procheta, Li, Ruizhe, Yilmaz, Emine, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Goharian, Nazli, editor, Tonellotto, Nicola, editor, He, Yulan, editor, Lipani, Aldo, editor, McDonald, Graham, editor, Macdonald, Craig, editor, and Ounis, Iadh, editor
Published: 2024
Full Text: View/download PDF

28. MetaASSIST: Robust Dialogue State Tracking with Meta Learning

Author: Ye, Fanghua, Wang, Xi, Huang, Jie, Li, Shenghui, Stern, Samuel, and Yilmaz, Emine
Subjects: Computer Science - Computation and Language
Abstract: Existing dialogue datasets contain lots of noise in their state annotations. Such noise can hurt model training and ultimately lead to poor generalization performance. A general framework named ASSIST has recently been proposed to train robust dialogue state tracking (DST) models. It introduces an auxiliary model to generate pseudo labels for the noisy training set. These pseudo labels are combined with vanilla labels by a common fixed weighting parameter to train the primary DST model. Notwithstanding the improvements of ASSIST on DST, tuning the weighting parameter is challenging. Moreover, a single parameter shared by all slots and all instances may be suboptimal. To overcome these limitations, we propose a meta learning-based framework MetaASSIST to adaptively learn the weighting parameter. Specifically, we propose three schemes with varying degrees of flexibility, ranging from slot-wise to both slot-wise and instance-wise, to convert the weighting parameter into learnable functions. These functions are trained in a meta-learning manner by taking the validation set as meta data. Experimental results demonstrate that all three schemes can achieve competitive performance. Most impressively, we achieve a state-of-the-art joint goal accuracy of 80.10% on MultiWOZ 2.4., Comment: To appear at EMNLP 2022, 13 pages
Published: 2022

29. Just Mix Once: Worst-group Generalization by Group Interpolation

Author: Giannone, Giorgio, Havrylov, Serhii, Massiah, Jordan, Yilmaz, Emine, and Jiao, Yunlong
Subjects: Computer Science - Machine Learning
Abstract: Advances in deep learning theory have revealed how average generalization relies on superficial patterns in data. The consequences are brittle models with poor performance with shift in group distribution at test time. When group annotation is available, we can use robust optimization tools to tackle the problem. However, identification and annotation are time-consuming, especially on large datasets. A recent line of work leverages self-supervision and oversampling to improve generalization on minority groups without group annotation. We propose to unify and generalize these approaches using a class-conditional variant of mixup tailored for worst-group generalization. Our approach, Just Mix Once (JM1), interpolates samples during learning, augmenting the training distribution with a continuous mixture of groups. JM1 is domain agnostic and computationally efficient, can be used with any level of group annotation, and performs on par or better than the state-of-the-art on worst-group generalization. Additionally, we provide a simple explanation of why JM1 works., Comment: preprint
Published: 2022

30. Evaluation Metrics for Measuring Bias in Search Engine Results

Author: Gezici, Gizem, Lipani, Aldo, Saygin, Yucel, and Yilmaz, Emine
Subjects: Computer Science - Information Retrieval
Abstract: Search engines decide what we see for a given search query. Since many people are exposed to information through search engines, it is fair to expect that search engines are neutral. However, search engine results do not necessarily cover all the viewpoints of a search query topic, and they can be biased towards a specific view since search engine results are returned based on relevance, which is calculated using many features and sophisticated algorithms where search neutrality is not necessarily the focal point. Therefore, it is important to evaluate the search engine results with respect to bias. In this work we propose novel web search bias evaluation measures which take into account the rank and relevance. We also propose a framework to evaluate web search bias using the proposed measures and test our framework on two popular search engines based on 57 controversial query topics such as abortion, medical marijuana, and gay marriage. We measure the stance bias (in support or against), as well as the ideological bias (conservative or liberal). We observe that the stance does not necessarily correlate with the ideological leaning, e.g. a positive stance on abortion indicates a liberal leaning but a positive stance on Cuba embargo indicates a conservative leaning. Our experiments show that neither of the search engines suffers from stance bias. However, both search engines suffer from ideological bias, both favouring one ideological leaning to the other, which is more significant from the perspective of polarisation in our society.
Published: 2022
Full Text: View/download PDF

31. Integrated Weak Learning

Author: Hayes, Peter, Zhang, Mingtian, Habib, Raza, Burgess, Jordan, Yilmaz, Emine, and Barber, David
Subjects: Computer Science - Machine Learning
Abstract: We introduce Integrated Weak Learning, a principled framework that integrates weak supervision into the training process of machine learning models. Our approach jointly trains the end-model and a label model that aggregates multiple sources of weak supervision. We introduce a label model that can learn to aggregate weak supervision sources differently for different datapoints and takes into consideration the performance of the end-model during training. We show that our approach outperforms existing weak learning techniques across a set of 6 benchmark classification datasets. When both a small amount of labeled data and weak supervision are present the increase in performance is both consistent and large, reliably getting a 2-5 point test F1 score gain over non-integrated methods., Comment: 14 pages, 4 figures
Published: 2022

32. ViralBERT: A User Focused BERT-Based Approach to Virality Prediction

Author: Rameez, Rikaz, Rahmani, Hossein A., and Yilmaz, Emine
Subjects: Computer Science - Computation and Language, Computer Science - Social and Information Networks
Abstract: Recently, Twitter has become the social network of choice for sharing and spreading information to a multitude of users through posts called 'tweets'. Users can easily re-share these posts to other users through 'retweets', which allow information to cascade to many more users, increasing its outreach. Clearly, being able to know the extent to which a post can be retweeted has great value in advertising, influencing and other such campaigns. In this paper we propose ViralBERT, which can be used to predict the virality of tweets using content- and user-based features. We employ a method of concatenating numerical features such as hashtags and follower numbers to tweet text, and utilise two BERT modules: one for semantic representation of the combined text and numerical features, and another module purely for sentiment analysis of text, as both the information within text and it's ability to elicit an emotional response play a part in retweet proneness. We collect a dataset of 330k tweets to train ViralBERT and validate the efficacy of our model using baselines from current studies in this field. Our experiments show that our approach outperforms these baselines, with a 13% increase in both F1 Score and Accuracy compared to the best performing baseline method. We then undergo an ablation study to investigate the importance of chosen features, finding that text sentiment and follower counts, and to a lesser extent mentions and following counts, are the strongest features for the model, and that hashtag counts are detrimental to the model., Comment: UMAP 2022
Published: 2022

33. Dynamic Schema Graph Fusion Network for Multi-Domain Dialogue State Tracking

Author: Feng, Yue, Lipani, Aldo, Ye, Fanghua, Zhang, Qiang, and Yilmaz, Emine
Subjects: Computer Science - Computation and Language, Computer Science - Information Retrieval
Abstract: Dialogue State Tracking (DST) aims to keep track of users' intentions during the course of a conversation. In DST, modelling the relations among domains and slots is still an under-studied problem. Existing approaches that have considered such relations generally fall short in: (1) fusing prior slot-domain membership relations and dialogue-aware dynamic slot relations explicitly, and (2) generalizing to unseen domains. To address these issues, we propose a novel \textbf{D}ynamic \textbf{S}chema \textbf{G}raph \textbf{F}usion \textbf{Net}work (\textbf{DSGFNet}), which generates a dynamic schema graph to explicitly fuse the prior slot-domain membership relations and dialogue-aware dynamic slot relations. It also uses the schemata to facilitate knowledge transfer to new domains. DSGFNet consists of a dialogue utterance encoder, a schema graph encoder, a dialogue-aware schema graph evolving network, and a schema graph enhanced dialogue state decoder. Empirical results on benchmark datasets (i.e., SGD, MultiWOZ2.1, and MultiWOZ2.2), show that DSGFNet outperforms existing methods., Comment: Accepted by ACL 2022
Published: 2022

34. ASSIST: Towards Label Noise-Robust Dialogue State Tracking

Author: Ye, Fanghua, Feng, Yue, and Yilmaz, Emine
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: The MultiWOZ 2.0 dataset has greatly boosted the research on dialogue state tracking (DST). However, substantial noise has been discovered in its state annotations. Such noise brings about huge challenges for training DST models robustly. Although several refined versions, including MultiWOZ 2.1-2.4, have been published recently, there are still lots of noisy labels, especially in the training set. Besides, it is costly to rectify all the problematic annotations. In this paper, instead of improving the annotation quality further, we propose a general framework, named ASSIST (lAbel noiSe-robuSt dIalogue State Tracking), to train DST models robustly from noisy labels. ASSIST first generates pseudo labels for each sample in the training set by using an auxiliary model trained on a small clean dataset, then puts the generated pseudo labels and vanilla noisy labels together to train the primary model. We show the validity of ASSIST theoretically. Experimental results also demonstrate that ASSIST improves the joint goal accuracy of DST by up to $28.16\%$ on MultiWOZ 2.0 and $8.41\%$ on MultiWOZ 2.4, compared to using only the vanilla noisy labels., Comment: Findings of ACL 2022, 13 pages
Published: 2022

35. Watch Less and Uncover More: Could Navigation Tools Help Users Search and Explore Videos?

Author: Perez-Ortiz, Maria, Bulathwela, Sahan, Dormann, Claire, Verma, Meghana, Kreitmayer, Stefan, Noss, Richard, Shawe-Taylor, John, Rogers, Yvonne, and Yilmaz, Emine
Subjects: Computer Science - Information Retrieval, Computer Science - Human-Computer Interaction
Abstract: Prior research has shown how 'content preview tools' improve speed and accuracy of user relevance judgements across different information retrieval tasks. This paper describes a novel user interface tool, the Content Flow Bar, designed to allow users to quickly identify relevant fragments within informational videos to facilitate browsing, through a cognitively augmented form of navigation. It achieves this by providing semantic "snippets" that enable the user to rapidly scan through video content. The tool provides visually-appealing pop-ups that appear in a time series bar at the bottom of each video, allowing to see in advance and at a glance how topics evolve in the content. We conducted a user study to evaluate how the tool changes the users search experience in video retrieval, as well as how it supports exploration and information seeking. The user questionnaire revealed that participants found the Content Flow Bar helpful and enjoyable for finding relevant information in videos. The interaction logs of the user study, where participants interacted with the tool for completing two informational tasks, showed that it holds promise for enhancing discoverability of content both across and within videos. This discovered potential could leverage a new generation of navigation tools in search and information retrieval., Comment: Published at the ACM SIGIR Conference on Human Information Interaction and Retrieval (CHIIR'22)
Published: 2022

36. Semantic TrueLearn: Using Semantic Knowledge Graphs in Recommendation Systems

Author: Bulathwela, Sahan, Pérez-Ortiz, María, Yilmaz, Emine, and Shawe-Taylor, John
Subjects: Computer Science - Information Retrieval, Computer Science - Artificial Intelligence, Computer Science - Computers and Society, Statistics - Applications, Statistics - Machine Learning, H.3.3, J.1, I.2.0
Abstract: In informational recommenders, many challenges arise from the need to handle the semantic and hierarchical structure between knowledge areas. This work aims to advance towards building a state-aware educational recommendation system that incorporates semantic relatedness between knowledge topics, propagating latent information across semantically related topics. We introduce a novel learner model that exploits this semantic relatedness between knowledge components in learning resources using the Wikipedia link graph, with the aim to better predict learner engagement and latent knowledge in a lifelong learning scenario. In this sense, Semantic TrueLearn builds a humanly intuitive knowledge representation while leveraging Bayesian machine learning to improve the predictive performance of the educational engagement. Our experiments with a large dataset demonstrate that this new semantic version of TrueLearn algorithm achieves statistically significant improvements in terms of predictive performance with a simple extension that adds semantic awareness to the model., Comment: Presented at the First International Workshop on Joint Use of Probabilistic Graphical Models and Ontology at Conference on Knowledge Graph and Semantic Web 2021
Published: 2021

37. Constructing Health and Physical Activity Knowledge in Practice: Teachers' and Students' Experiences

Author: Yilmaz, Emine Busra and Hunuk, Deniz
Abstract: Purpose: The purpose of this study was to examine teacher and student experiences in physical education when taught by teachers with high health-related fitness knowledge (HRFK): How did they construct this knowledge for students and share it with students in their teaching? Method: Four teachers and 16 of their students were interviewed. Results: Three themes emerged from the data: (a) HRFK sources of teachers and students, (b) teacher- and student-constructed HRFK in the instructional setting, and (c) students' transfer of physical activity and HRFK to their daily lives and to those around them. Conclusion: The study showed that when teachers had ample health and physical activity knowledge and transfer this knowledge to their students by designing holistic learning experiences using effective pedagogical approaches, students tended to value lifetime physical activity participation. These students were also able to influence those around them (coaches, family, and friends) to be conscious of their own health and physical activity behaviors.
Published: 2023
Full Text: View/download PDF

38. Predicting Engagement in Video Lectures

Author: Bulathwela, Sahan, Pérez-Ortiz, María, Lipani, Aldo, Yilmaz, Emine, and Shawe-Taylor, John
Abstract: The explosion of Open Educational Resources (OERs) in the recent years creates the demand for scalable, automatic approaches to process and evaluate OERs, with the end goal of identifying and recommending the most suitable educational materials for learners. We focus on building models to find the characteristics and features involved in contextagnostic engagement (i.e. population-based), a seldom researched topic compared to other contextualised and personalised approaches that focus more on individual learner engagement. Learner engagement, is arguably a more reliable measure than popularity/number of views, is more abundant than user ratings and has also been shown to be a crucial component in achieving learning outcomes. In this work, we explore the idea of building a predictive model for population-based engagement in education. We introduce a novel, large dataset of video lectures for predicting context-agnostic engagement and propose both cross-modal and modality specific feature sets to achieve this task. We further test different strategies for quantifying learner engagement signals. We demonstrate the use of our approach in the case of data scarcity. Additionally, we perform a sensitivity analysis of the best performing model, which shows promising performance and can be easily integrated into an educational recommender system for OERs. [For the full proceedings, see ED607784.]
Published: 2020

39. Proceedings of the CSCW 2021 Workshop -- Investigating and Mitigating Biases in Crowdsourced Data

Author: Hettiachchi, Danula, Sanderson, Mark, Goncalves, Jorge, Hosio, Simo, Kazai, Gabriella, Lease, Matthew, Schaekermann, Mike, and Yilmaz, Emine
Subjects: Computer Science - Human-Computer Interaction
Abstract: This volume contains the position papers presented at CSCW 2021 Workshop - Investigating and Mitigating Biases in Crowdsourced Data, held online on 23rd October 2021, at the 24th ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW 2021). The workshop explored how specific crowdsourcing workflows, worker attributes, and work practices contribute to biases in data. The workshop also included discussions on research directions to mitigate labelling biases, particularly in a crowdsourced context, and the implications of such methods for the workers., Comment: 30 pages, More details available in the workshop website at https://sites.google.com/view/biases-in-crowdsourced-data
Published: 2021

40. Towards More Accountable Search Engines: Online Evaluation of Representation Bias

Author: Lipani, Aldo, Piroi, Florina, and Yilmaz, Emine
Subjects: Computer Science - Information Retrieval
Abstract: Information availability affects people's behavior and perception of the world. Notably, people rely on search engines to satisfy their need for information. Search engines deliver results relevant to user requests usually without being or making themselves accountable for the information they deliver, which may harm people's lives and, in turn, society. This potential risk urges the development of evaluation mechanisms of bias in order to empower the user in judging the results of search engines. In this paper, we give a possible solution to measuring representation bias with respect to societal features for search engines and apply it to evaluating the gender representation bias for Google's Knowledge Graph Carousel for listing occupations.
Published: 2021

41. Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations

Author: Liu, Fangyu, Jiao, Yunlong, Massiah, Jordan, Yilmaz, Emine, and Havrylov, Serhii
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: In NLP, a large volume of tasks involve pairwise comparison between two sequences (e.g. sentence similarity and paraphrase identification). Predominantly, two formulations are used for sentence-pair tasks: bi-encoders and cross-encoders. Bi-encoders produce fixed-dimensional sentence representations and are computationally efficient, however, they usually underperform cross-encoders. Cross-encoders can leverage their attention heads to exploit inter-sentence interactions for better performance but they require task fine-tuning and are computationally more expensive. In this paper, we present a completely unsupervised sentence representation model termed as Trans-Encoder that combines the two learning paradigms into an iterative joint framework to simultaneously learn enhanced bi- and cross-encoders. Specifically, on top of a pre-trained Language Model (PLM), we start with converting it to an unsupervised bi-encoder, and then alternate between the bi- and cross-encoder task formulations. In each alternation, one task formulation will produce pseudo-labels which are used as learning signals for the other task formulation. We then propose an extension to conduct such self-distillation approach on multiple PLMs in parallel and use the average of their pseudo-labels for mutual-distillation. Trans-Encoder creates, to the best of our knowledge, the first completely unsupervised cross-encoder and also a state-of-the-art unsupervised bi-encoder for sentence similarity. Both the bi-encoder and cross-encoder formulations of Trans-Encoder outperform recently proposed state-of-the-art unsupervised sentence encoders such as Mirror-BERT and SimCSE by up to 5% on the sentence similarity benchmarks., Comment: ICLR 2022; code and models are released at https://github.com/amzn/trans-encoder
Published: 2021

42. Sample Efficient Model Evaluation

Author: Yilmaz, Emine, Hayes, Peter, Habib, Raza, Burgess, Jordan, and Barber, David
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Labelling data is a major practical bottleneck in training and testing classifiers. Given a collection of unlabelled data points, we address how to select which subset to label to best estimate test metrics such as accuracy, $F_1$ score or micro/macro $F_1$. We consider two sampling based approaches, namely the well-known Importance Sampling and we introduce a novel application of Poisson Sampling. For both approaches we derive the minimal error sampling distributions and how to approximate and use them to form estimators and confidence intervals. We show that Poisson Sampling outperforms Importance Sampling both theoretically and experimentally.
Published: 2021

43. PEEK: A Large Dataset of Learner Engagement with Educational Videos

Author: Bulathwela, Sahan, Perez-Ortiz, Maria, Novak, Erik, Yilmaz, Emine, and Shawe-Taylor, John
Subjects: Computer Science - Information Retrieval, Computer Science - Computers and Society, Computer Science - Machine Learning, H.3.3, J.1, I.2.0, H.3.3, J.1, I.2.0
Abstract: Educational recommenders have received much less attention in comparison to e-commerce and entertainment-related recommenders, even though efficient intelligent tutors have great potential to improve learning gains. One of the main challenges in advancing this research direction is the scarcity of large, publicly available datasets. In this work, we release a large, novel dataset of learners engaging with educational videos in-the-wild. The dataset, named Personalised Educational Engagement with Knowledge Topics PEEK, is the first publicly available dataset of this nature. The video lectures have been associated with Wikipedia concepts related to the material of the lecture, thus providing a humanly intuitive taxonomy. We believe that granular learner engagement signals in unison with rich content representations will pave the way to building powerful personalization algorithms that will revolutionise educational and informational recommendation systems. Towards this goal, we 1) construct a novel dataset from a popular video lecture repository, 2) identify a set of benchmark algorithms to model engagement, and 3) run extensive experimentation on the PEEK dataset to demonstrate its value. Our experiments with the dataset show promise in building powerful informational recommender systems. The dataset and the support code is available publicly., Comment: To be published at ORSUM '21: 4th Workshop on Online Recommender Systems and User Modeling at ACM RecSys 2021
Published: 2021

44. Estimation of Fair Ranking Metrics with Incomplete Judgments

Author: Kırnap, Ömer, Diaz, Fernando, Biega, Asia, Ekstrand, Michael, Carterette, Ben, and Yılmaz, Emine
Subjects: Computer Science - Information Retrieval, Computer Science - Computers and Society, Computer Science - Machine Learning
Abstract: There is increasing attention to evaluating the fairness of search system ranking decisions. These metrics often consider the membership of items to particular groups, often identified using protected attributes such as gender or ethnicity. To date, these metrics typically assume the availability and completeness of protected attribute labels of items. However, the protected attributes of individuals are rarely present, limiting the application of fair ranking metrics in large scale systems. In order to address this problem, we propose a sampling strategy and estimation technique for four fair ranking metrics. We formulate a robust and unbiased estimator which can operate even with very limited number of labeled items. We evaluate our approach using both simulated and real world data. Our experimental results demonstrate that our method can estimate this family of fair ranking metrics and provides a robust, reliable alternative to exhaustive or random data annotation., Comment: Published in Proceedings of the Web Conference 2021 (WWW '21)
Published: 2021

45. Learning to Detect Few-Shot-Few-Clue Misinformation

Author: Zhang, Qiang, Huang, Hongbin, Liang, Shangsong, Meng, Zaiqiao, and Yilmaz, Emine
Subjects: Computer Science - Social and Information Networks
Abstract: The quality of digital information on the web has been disquieting due to the lack of careful manual review. Consequently, a large volume of false textual information has been disseminating for a long time since the prevalence of social media. The potential negative influence of misinformation on the public is a growing concern. Therefore, it is strongly motivated to detect online misinformation as early as possible. Few-shot-few-clue learning applies in this misinformation detection task when the number of annotated statements is quite few (called few shots) and the corresponding evidence is also quite limited in each shot (called few clues). Within the few-shot-few-clue framework, we propose a Bayesian meta-learning algorithm to extract the shared patterns among different topics (i.e.different tasks) of misinformation. Moreover, we derive a scalable method, i.e., amortized variational inference, to optimize the Bayesian meta-learning algorithm. Empirical results on three benchmark datasets demonstrate the superiority of our algorithm. This work focuses more on optimizing parameters than designing detection models, and will generate fresh insights into data-efficient detection of online misinformation at early stages.
Published: 2021

46. Sensitive quantification of acetochlor and metolachlor in water using Taguchi-optimized DLLME coupled with high-performance liquid chromatography

Author: Altiparmak, Ezgi, Yilmaz, Emine, Dadaser-Celik, Filiz, and Ates, Nuray
Published: 2024
Full Text: View/download PDF

47. MS MARCO: Benchmarking Ranking Models in the Large-Data Regime

Author: Craswell, Nick, Mitra, Bhaskar, Yilmaz, Emine, Campos, Daniel, and Lin, Jimmy
Subjects: Computer Science - Information Retrieval, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Evaluation efforts such as TREC, CLEF, NTCIR and FIRE, alongside public leaderboard such as MS MARCO, are intended to encourage research and track our progress, addressing big questions in our field. However, the goal is not simply to identify which run is "best", achieving the top score. The goal is to move the field forward by developing new robust techniques, that work in many different settings, and are adopted in research and practice. This paper uses the MS MARCO and TREC Deep Learning Track as our case study, comparing it to the case of TREC ad hoc ranking in the 1990s. We show how the design of the evaluation effort can encourage or discourage certain outcomes, and raising questions about internal and external validity of results. We provide some analysis of certain pitfalls, and a statement of best practices for avoiding such pitfalls. We summarize the progress of the effort so far, and describe our desired end state of "robust usefulness", along with steps that might be required to get us there.
Published: 2021

48. TREC Deep Learning Track: Reusable Test Collections in the Large Data Regime

Author: Craswell, Nick, Mitra, Bhaskar, Yilmaz, Emine, Campos, Daniel, Voorhees, Ellen M., and Soboroff, Ian
Subjects: Computer Science - Information Retrieval, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: The TREC Deep Learning (DL) Track studies ad hoc search in the large data regime, meaning that a large set of human-labeled training data is available. Results so far indicate that the best models with large data may be deep neural networks. This paper supports the reuse of the TREC DL test collections in three ways. First we describe the data sets in detail, documenting clearly and in one place some details that are otherwise scattered in track guidelines, overview papers and in our associated MS MARCO leaderboard pages. We intend this description to make it easy for newcomers to use the TREC DL data. Second, because there is some risk of iteration and selection bias when reusing a data set, we describe the best practices for writing a paper using TREC DL data, without overfitting. We provide some illustrative analysis. Finally we address a number of issues around the TREC DL data, including an analysis of reusability., Comment: arXiv admin note: text overlap with arXiv:2003.07820
Published: 2021

49. MultiWOZ 2.4: A Multi-Domain Task-Oriented Dialogue Dataset with Essential Annotation Corrections to Improve State Tracking Evaluation

Author: Ye, Fanghua, Manotumruksa, Jarana, and Yilmaz, Emine
Subjects: Computer Science - Computation and Language
Abstract: The MultiWOZ 2.0 dataset has greatly stimulated the research of task-oriented dialogue systems. However, its state annotations contain substantial noise, which hinders a proper evaluation of model performance. To address this issue, massive efforts were devoted to correcting the annotations. Three improved versions (i.e., MultiWOZ 2.1-2.3) have then been released. Nonetheless, there are still plenty of incorrect and inconsistent annotations. This work introduces MultiWOZ 2.4, which refines the annotations in the validation set and test set of MultiWOZ 2.1. The annotations in the training set remain unchanged (same as MultiWOZ 2.1) to elicit robust and noise-resilient model training. We benchmark eight state-of-the-art dialogue state tracking models on MultiWOZ 2.4. All of them demonstrate much higher performance than on MultiWOZ 2.1., Comment: Accepted to SIGDIAL 2022 (https://2022.sigdial.org/)
Published: 2021

50. Significant Improvements over the State of the Art? A Case Study of the MS MARCO Document Ranking Leaderboard

Author: Lin, Jimmy, Campos, Daniel, Craswell, Nick, Mitra, Bhaskar, and Yilmaz, Emine
Subjects: Computer Science - Information Retrieval
Abstract: Leaderboards are a ubiquitous part of modern research in applied machine learning. By design, they sort entries into some linear order, where the top-scoring entry is recognized as the "state of the art" (SOTA). Due to the rapid progress being made in information retrieval today, particularly with neural models, the top entry in a leaderboard is replaced with some regularity. These are touted as improvements in the state of the art. Such pronouncements, however, are almost never qualified with significance testing. In the context of the MS MARCO document ranking leaderboard, we pose a specific question: How do we know if a run is significantly better than the current SOTA? We ask this question against the backdrop of recent IR debates on scale types: in particular, whether commonly used significance tests are even mathematically permissible. Recognizing these potential pitfalls in evaluation methodology, our study proposes an evaluation framework that explicitly treats certain outcomes as distinct and avoids aggregating them into a single-point metric. Empirical analysis of SOTA runs from the MS MARCO document ranking leaderboard reveals insights about how one run can be "significantly better" than another that are obscured by the current official evaluation metric (MRR@100).
Published: 2021

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

1,230 results on '"Yilmaz, Emine"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources