44 results on '"de Rijke, Maarten"'
Search Results
2. Incremental sparse Bayesian ordinal regression.
- Author
-
Li, Chang and de Rijke, Maarten
- Subjects
- *
BAYESIAN analysis , *MACHINE learning , *REGRESSION analysis , *ALGORITHMS , *MAXIMUM likelihood statistics - Abstract
Abstract Ordinal Regression (OR) aims to model the ordering information between different data categories, which is a crucial topic in multi-label learning. An important class of approaches to OR models the problem as a linear combination of basis functions that map features to a high-dimensional non-linear space. However, most of the basis function-based algorithms are time consuming. We propose an incremental sparse Bayesian approach to OR tasks and introduce an algorithm to sequentially learn the relevant basis functions in the ordinal scenario. Our method, called Incremental Sparse Bayesian Ordinal Regression (ISBOR), automatically optimizes the hyper-parameters via the type-II maximum likelihood method. By exploiting fast marginal likelihood optimization, ISBOR can avoid big matrix inverses, which is the main bottleneck in applying basis function-based algorithms to OR tasks on large-scale datasets. We show that ISBOR can make accurate predictions with parsimonious basis functions while offering automatic estimates of the prediction uncertainty. Extensive experiments on synthetic and real word datasets demonstrate the efficiency and effectiveness of ISBOR compared to other basis function-based OR approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
3. Evaluating the Robustness of Click Models to Policy Distributional Shift.
- Author
-
DEFFAYET, ROMAIN, RENDERS, JEAN-MICHEL, and DE RIJKE, MAARTEN
- Abstract
The article primarily addresses the evaluation of click models used in search engines and their sensitivity to changes in ranking policies, known as policy distributional shift (PDS). It explores the impact of PDS on click models and proposes a new evaluation protocol to assess their robustness in various downstream tasks, such as click-through rate prediction and offline bandits, providing insights and guidelines for handling policy changes in click model deployments.
- Published
- 2023
- Full Text
- View/download PDF
4. The Importance of Length Normalization for XML Retrieval.
- Author
-
Kamps, Jaap, de Rijke, Maarten, and Sigurbjörnsson, Börkur
- Subjects
- *
XML (Extensible Markup Language) , *COMPARATIVE studies , *DATABASES , *DOCUMENT markup languages , *COMPUTER systems , *HYPERTEXT systems - Abstract
XML retrieval is a departure from standard document retrieval in which each individual XML element, ranging from italicized words or phrases to full blown articles, is a retrievable unit. The distribution of XML element lengths is unlike what we usually observe in standard document collections, prompting us to revisit the issue of document length normalization. We perform a comparative analysis of arbitrary elements versus relevant elements, and show the importance of element length as a parameter for XML retrieval. Within the language modeling framework, we investigate a range of techniques that deal with length either directly or indirectly. We observe a length-bias introduced by the amount of smoothing, and show the importance of extreme length bias for XML retrieval. We also show that simply removing shorter elements from the index (by introducing a cut-off value) does not create an appropriate element length normalization. Even after restricting the minimal size of XML elements occurring in the index, the importance of an extreme explicit length bias remains. [ABSTRACT FROM AUTHOR]
- Published
- 2005
- Full Text
- View/download PDF
5. Few-shot Learning for Heterogeneous Information Networks.
- Author
-
Fang, Yang, Zhao, Xiang, Xiao, Weidong, and de Rijke, Maarten
- Abstract
The article introduces a dual meta-learning (DML) technique to enhance semi-supervised text classification, improving both teacher and student classifiers iteratively. Topics include the challenge of noisy pseudo-labels in semi-supervised text classification, the proposed meta-learning methods for noise correction and pseudo supervision, and the experimental validation demonstrating the effectiveness of the DML framework.
- Published
- 2024
- Full Text
- View/download PDF
6. A system of dynamic modal logic.
- Author
-
De Rijke, Maarten
- Subjects
- *
MODAL analysis - Abstract
Studies the language of dynamic modal logic (DML), in relation to cognitive states. What are the technical aspects of dynamic modal logic; Description of the expressive power of this language; How the logic can be used to capture situations of dynamic interest.
- Published
- 1998
- Full Text
- View/download PDF
7. A Survey on Variational Autoencoders in Recommender Systems.
- Author
-
Liang, Shangsong, Pan, Zhou, liu, wei, Yin, Jian, and de Rijke, Maarten
- Published
- 2024
- Full Text
- View/download PDF
8. Listwise Generative Retrieval Models via a Sequential Learning Process.
- Author
-
Tang, Yubao, Zhang, Ruqing, Guo, Jiafeng, de Rijke, Maarten, Chen, Wei, and Cheng, Xueqi
- Abstract
The article focuses on improving generative retrieval (GR) models by introducing a listwise approach that optimizes relevance at the document list level, unlike existing models that use pointwise approaches. It mentions by treating the generation of a ranked document list as a sequence learning process, the proposed method maximizes the generation likelihood of each document given the preceding documents in the list, addressing the sub-optimality of pointwise approaches.
- Published
- 2024
- Full Text
- View/download PDF
9. A taxonomy, data set, and benchmark for detecting and classifying malevolent dialogue responses.
- Author
-
Zhang, Yangjun, Ren, Pengjie, and de Rijke, Maarten
- Subjects
- *
CONVERSATION , *NATURAL language processing , *SOCIAL media , *BEHAVIOR disorders , *BENCHMARKING (Management) , *INTERPERSONAL relations , *ANTISOCIAL personality disorders , *AGGRESSION (Psychology) , *EMOTIONS - Abstract
Conversational interfaces are increasingly popular as a way of connecting people to information. With the increased generative capacity of corpus‐based conversational agents comes the need to classify and filter out malevolent responses that are inappropriate in terms of content and dialogue acts. Previous studies on the topic of detecting and classifying inappropriate content are mostly focused on a specific category of malevolence or on single sentences instead of an entire dialogue. We make three contributions to advance research on the malevolent dialogue response detection and classification (MDRDC) task. First, we define the task and present a hierarchical malevolent dialogue taxonomy. Second, we create a labeled multiturn dialogue data set and formulate the MDRDC task as a hierarchical classification task. Last, we apply state‐of‐the‐art text classification methods to the MDRDC task, and report on experiments aimed at assessing the performance of these approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
10. A Large-scale Analysis of Mixed Initiative in Information-Seeking Dialogues for Conversational Search.
- Author
-
VAKULENKO, SVITLANA, KANOULAS, EVANGELOS, and DE RIJKE, MAARTEN
- Subjects
- *
ARTIFICIAL intelligence , *ACQUISITION of data - Abstract
Conversational search is a relatively young area of research that aims at automating an information-seeking dialogue. In this article, we help to position it with respect to other research areas within conversational artificial intelligence (AI) by analysing the structural properties of an information-seeking dialogue. To this end, we perform a large-scale dialogue analysis of more than 150K transcripts from 16 publicly available dialogue datasets. These datasets were collected to inform different dialogue-based tasks including conversational search. We extract different patterns of mixed initiative from these dialogue transcripts and use them to compare dialogues of different types. Moreover, we contrast the patterns found in information-seeking dialogues that are being used for research purposes with the patterns found in virtual reference interviews that were conducted by professional librarians. The insights we provide (1) establish close relations between conversational search and other conversational AI tasks and (2) uncover limitations of existing conversational datasets to inform future data collection tasks. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
11. Metaphorical User Simulators for Evaluating Task-oriented Dialogue Systems.
- Author
-
WEIWEI SUN, SHUYU GUO, SHUO ZHANG, PENGJIE REN, ZHUMIN CHEN, DE RIJKE, MAARTEN, and ZHAOCHUN REN
- Abstract
The article focuses on improving the evaluation of task-oriented dialogue systems (TDSs) through the use of user simulators. Topics include the limitations of current TDS evaluation methods, the challenges of employing user simulators for TDS evaluation, and the proposed solution which involves a metaphorical user simulator (MetaSim) and a tester-based evaluation framework.
- Published
- 2024
- Full Text
- View/download PDF
12. A Research Agenda for Hybrid Intelligence: Augmenting Human Intellect With Collaborative, Adaptive, Responsible, and Explainable Artificial Intelligence.
- Author
-
Akata, Zeynep, Balliet, Dan, de Rijke, Maarten, Dignum, Frank, Dignum, Virginia, Eiben, Guszti, Fokkens, Antske, Grossi, Davide, Hindriks, Koen, Hoos, Holger, Hung, Hayley, Jonker, Catholijn, Monz, Christof, Neerincx, Mark, Oliehoek, Frans, Prakken, Henry, Schlobach, Stefan, van der Gaag, Linda, van Harmelen, Frank, and van Hoof, Herke
- Subjects
- *
INTELLECT , *ARTIFICIAL intelligence , *TASK analysis - Abstract
We define hybrid intelligence (HI) as the combination of human and machine intelligence, augmenting human intellect and capabilities instead of replacing them and achieving goals that were unreachable by either humans or machines. HI is an important new research focus for artificial intelligence, and we set a research agenda for HI by formulating four challenges. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
13. Safe Exploration for Optimizing Contextual Bandits.
- Author
-
JAGERMAN, ROLF, MARKOV, ILYA, and DE RIJKE, MAARTEN
- Subjects
- *
CONTEXTUAL learning , *INFORMATION retrieval , *ALGORITHMS , *ONLINE education , *RANKING (Statistics) - Abstract
Contextual bandit problems are a natural fit for many information retrieval tasks, such as learning to rank, text classification, recommendation, and so on. However, existing learning methods for contextual bandit problems have one of two drawbacks: They either do not explore the space of all possible document rankings (i.e., actions) and, thus, may miss the optimal ranking, or they present suboptimal rankings to a user and, thus, may harm the user experience. We introduce a new learning method for contextual bandit problems, Safe Exploration Algorithm (SEA), which overcomes the above drawbacks. SEA starts by using a baseline (or production) ranking system (i.e., policy), which does not harm the user experience and, thus, is safe to execute but has suboptimal performance and, thus, needs to be improved. Then SEA uses counterfactual learning to learn a new policy based on the behavior of the baseline policy. SEA also uses high-confidence off-policy evaluation to estimate the performance of the newly learned policy. Once the performance of the newly learned policy is at least as good as the performance of the baseline policy, SEA starts using the new policy to execute new actions, allowing it to actively explore favorable regions of the action space. This way, SEA never performs worse than the baseline policy and, thus, does not harm the user experience, while still exploring the action space and, thus, being able to find an optimal policy. Our experiments using text classification and document retrieval confirm the above by comparing SEA (and a boundless variant called BSEA) to online and offline learning methods for contextual bandit problems. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
14. A Next Basket Recommendation Reality Check.
- Author
-
MING LI, JULLIEN, SAMI, ARIANNEZHAD, MOZHDEH, and DE RIJKE, MAARTEN
- Abstract
The article focuses on evaluating Next Basket Recommendation (NBR) methods for suggesting items in users future shopping baskets. It discusses the distinction between repeat and explore items in recommendations, introduces novel metrics for NBR evaluation, and highlights the variations in method performance across different datasets and scenarios, shedding light on the challenges and opportunities in this area of research.
- Published
- 2023
- Full Text
- View/download PDF
15. PRADA: Practical Black-box Adversarial Attacks against Neural Ranking Models.
- Author
-
CHEN WU, RUQING ZHANG, JIAFENG GUO, DE RIJKE, MAARTEN, YIXING FAN, and XUEQI CHENG
- Abstract
The article's focus lies in examining the vulnerability of neural ranking models (NRMs) to adversarial attacks, particularly emphasizing the Word Substitution Ranking Attack (WSRA) task, aiming to enhance a target document's ranking by subtly altering its text. It covers topics such as proposing a novel PRADA method for generating adversarial examples, conducting experiments on web search datasets, and highlighting the importance of identifying NRM vulnerabilities to enhance model robustness.
- Published
- 2023
- Full Text
- View/download PDF
16. Improving Transformer-based Sequential Recommenders through Preference Editing.
- Author
-
MUYANG MA, PENGJIE REN, ZHUMIN CHEN, ZHAOCHUN REN, HUASHENG LIANG, JUN MA, and DE RIJKE, MAARTEN
- Abstract
One of the key challenges in sequential recommendation is how to extract and represent user preferences. Traditional methods rely solely on predicting the next item. But user behavior may be driven by complex preferences. Therefore, these methods cannot make accurate recommendations when the available information user behavior is limited. To explore multiple user preferences, we propose a transformer-based sequential recommendation model, named MrTransformer (Multi-preference Transformer). For training MrTransformer, we devise a preference-editing-based self-supervised learning (SSL) mechanism that explores extra supervision signals based on relations with other sequences. The idea is to force the sequential recommendation model to discriminate between common and unique preferences in different sequences of interactions. By doing so, the sequential recommendation model is able to disentangle user preferences into multiple independent preference representations so as to improve user preference extraction and representation. We carry out extensive experiments on five benchmark datasets. MrTransformer with preference editing significantly outperforms state-of-the-art sequential recommendation methods in terms of Recall, MRR, and NDCG. We find that long sequences of interactions from which user preferences are harder to extract and represent benefit most from preference editing. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
17. Neural information retrieval: introduction to the special issue.
- Author
-
Craswell, Nick, Croft, W. Bruce, de Rijke, Maarten, Guo, Jiafeng, and Mitra, Bhaskar
- Subjects
- *
NEURAL circuitry , *ARTIFICIAL intelligence , *ARTIFICIAL neural networks , *NATURAL language processing , *DATA analysis - Published
- 2018
- Full Text
- View/download PDF
18. The birth of collective memories: Analyzing emerging entities in text streams.
- Author
-
Graus, David, Odijk, Daan, and de Rijke, Maarten
- Subjects
- *
INFORMATION resources management , *MEMORY , *PRESS , *RESEARCH funding , *TIME series analysis , *WORLD Wide Web , *REFERENCE sources , *SOCIAL media - Abstract
We study how collective memories are formed online. We do so by tracking entities that emerge in public discourse, that is, in online text streams such as social media and news streams, before they are incorporated into Wikipedia, which, we argue, can be viewed as an online place for collective memory. By tracking how entities emerge in public discourse, that is, the temporal patterns between their first mention in online text streams and subsequent incorporation into collective memory, we gain insights into how the collective remembrance process happens online. Specifically, we analyze nearly 80,000 entities as they emerge in online text streams before they are incorporated into Wikipedia. The online text streams we use for our analysis comprise of social media and news streams, and span over 579 million documents in a time span of 18 months. We discover two main emergence patterns: entities that emerge in a “bursty” fashion, that is, that appear in public discourse without a precedent, blast into activity and transition into collective memory. Other entities display a “delayed” pattern, where they appear in public discourse, experience a period of inactivity, and then resurface before transitioning into our cultural collective memory. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
19. Behavior-based personalization in web search.
- Author
-
Cai, Fei, Wang, Shuaiqiang, and de Rijke, Maarten
- Subjects
- *
ALGORITHMS , *DATABASE management , *PROBABILITY theory , *T-test (Statistics) , *USER interfaces , *SEARCH engines , *MAXIMUM likelihood statistics - Abstract
Personalized search approaches tailor search results to users' current interests, so as to help improve the likelihood of a user finding relevant documents for their query. Previous work on personalized search focuses on using the content of the user's query and of the documents clicked to model the user's preference. In this paper we focus on a different type of signal: We investigate the use of behavioral information for the purpose of search personalization. That is, we consider clicks and dwell time for reranking an initially retrieved list of documents. In particular, we (i) investigate the impact of distributions of users and queries on document reranking; (ii) estimate the relevance of a document for a query at 2 levels, at the query-level and at the word-level, to alleviate the problem of sparseness; and (iii) perform an experimental evaluation both for users seen during the training period and for users not seen during training. For the latter, we explore the use of information from similar users who have been seen during the training period. We use the dwell time on clicked documents to estimate a document's relevance to a query, and perform Bayesian probabilistic matrix factorization to generate a relevance distribution of a document over queries. Our experiments show that: (i) for personalized ranking, behavioral information helps to improve retrieval effectiveness; and (ii) given a query, merging information inferred from behavior of a particular user and from behaviors of other users with a user-dependent adaptive weight outperforms any combination with a fixed weight. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
20. Prefix-Adaptive and Time-Sensitive Personalized Query Auto Completion.
- Author
-
Fei Cai, Shangsong Liang, and de Rijke, Maarten
- Subjects
- *
QUERY (Information retrieval system) , *SEARCH engine optimization , *WEB personalization , *DATA structures , *ELECTRONIC services , *EMAIL - Abstract
Query auto completion (QAC) methods recommend queries to search engine users when they start entering a query. Current QAC methods mostly rank query completions based on their past popularity, i.e., on the number of times they have previously been submitted as a query. However, query popularity changes over time and may vary drastically across users. Accordingly, the ranking of query completions should be adjusted. Previous time-sensitive and user-specific QAC methods have been developed separately, yielding significant improvements over methods that are neither time-sensitive nor personalized. We propose a hybrid QAC method that is both time-sensitive and personalized. We extend it to handle long-tail prefixes, which we achieve by assigning optimal weights to the contribution from time-sensitivity and personalization. Using real-world search log datasets, we return top $N$
query suggestions ranked by predicted popularity as estimated from popularity trends and cyclic popularity behavior; we rerank them by integrating similarities to a user's previous queries (both in the current session and in previous sessions). Our method outperforms state-of-the-art time-sensitive QAC baselines, achieving total improvements of between 3 and 7 percent in terms of mean reciprocal rank (MRR). After optimizing the weights, our extended model achieves MRR improvements of between 4 and 8 percent. [ABSTRACT FROM PUBLISHER]- Published
- 2016
- Full Text
- View/download PDF
21. Diversifying Query Auto-Completion.
- Author
-
FEI CAI, REINANDA, RIDHO, and DE RIJKE, MAARTEN
- Subjects
- *
QUERY (Information retrieval system) , *SEMANTICS , *AUTOMATIC control systems , *COMPUTER science , *INTERNET searching - Abstract
Query auto-completion assists web search users in formulating queries with a few keystrokes, helping them to avoid spelling mistakes and to produce clear query expressions, and so on. Previous work on query autocompletion mainly centers around returning a list of completions to users, aiming to push queries that are most likely intended by the user to the top positions but ignoring the redundancy among the query candidates in the list. Thus, semantically related queries matching the input prefix are often returned together. This may push valuable suggestions out of the list, given that only a limited number of candidates can be shown to the user, which may result in a less than optimal search experience. In this article, we consider the task of diversifying query auto-completion, which aims to return the correct query completions early in a ranked list of candidate completions and at the same time reduce the redundancy among query auto-completion candidates. We develop a greedy query selection approach that predicts query completions based on the current search popularity of candidate completions and on the aspects of previous queries in the same search session. The popularity of completion candidates at query time can be directly aggregated from query logs. However, query aspects are implicitly expressed by previous clicked documents in the search context. To determine the query aspect, we categorize clicked documents of a query using a hierarchy based on the open directory project. Bayesian probabilistic matrix factorization is applied to derive the distribution of queries over all aspects. We quantify the improvement of our greedy query selection model against a state-of-the-art baseline using two large-scale, real-world query logs and show that it beats the baseline in terms of well-known metrics used in query auto-completion and diversification. In addition, we conduct a side-by-side experiment to verify the effectiveness of our proposal. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
22. Scalable Representation Learning for Dynamic Heterogeneous Information Networks via Metagraphs.
- Author
-
YANG FANG, XIANG ZHAO, PEIXIN HUANG, WEIDONG XIAO, and DE RIJKE, MAARTEN
- Subjects
- *
INFORMATION networks , *TIMESTAMPS , *INFORMATION retrieval , *DYNAMIC models - Abstract
Content representation is a fundamental task in information retrieval. Representation learning is aimed at capturing features of an information object in a low-dimensional space. Most research on representation learning for heterogeneous information networks (HINs) focuses on static HINs. In practice, however, networks are dynamic and subject to constant change. In this article, we propose a novel and scalable representation learning model, M-DHIN, to explore the evolution of a dynamic HIN. We regard a dynamic HIN as a series of snapshots with different time stamps. We first use a static embedding method to learn the initial embeddings of a dynamic HIN at the first time stamp. We describe the features of the initial HIN via metagraphs, which retains more structural and semantic information than traditional path-oriented static models. We also adopt a complex embedding scheme to better distinguish between symmetric and asymmetric metagraphs. Unlike traditional models that process an entire network at each time stamp, we build a so-called change dataset that only includes nodes involved in a triadic closure or opening process, as well as newly added or deleted nodes. Then, we utilize the above metagraph-based mechanism to train on the change dataset. As a result of this setup, M-DHIN is scalable to large dynamic HINs since it only needs to model the entire HIN once while only the changed parts need to be processed over time. Existing dynamic embedding models only express the existing snapshots and cannot predict the future network structure. To equip M-DHIN with this ability, we introduce an LSTM-based deep autoencoder model that processes the evolution of the graph via an LSTM encoder and outputs the predicted graph. Finally, we evaluate the proposed model, M-DHIN, on real-life datasets and demonstrate that it significantly and consistently outperforms state-of-the-art models. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
23. Hyperspherical Variational Co-embedding for Attributed Networks.
- Author
-
JINYUAN FANG, SHANGSONG LIANG, ZAIQIAO MENG, and DE RIJKE, MAARTEN
- Subjects
- *
VIRTUAL networks , *SOCIAL networks , *INFORMATION retrieval - Abstract
Network-based information has been widely explored and exploited in the information retrieval literature. Attributed networks, consisting of nodes, edges aswell as attributes describing properties of nodes, are a basic type of network-based data, and are especially useful for many applications. Examples include user profiling in social networks and item recommendation in user-item purchase networks. Learning useful and expressive representations of entities in attributed networks can provide more effective building blocks to down-stream network-based tasks such as link prediction and attribute inference. Practically, input features of attributed networks are normalized as unit directional vectors. However, most network embedding techniques ignore the spherical nature of inputs and focus on learning representations in a Gaussian or Euclidean space, which, we hypothesize, might lead to less effective representations. To obtain more effective representations of attributed networks, we investigate the problem of mapping an attributed networkwith unit normalized directional features into a non-Gaussian and non-Euclidean space. Specifically, we propose a hyperspherical variational co-embedding for attributed networks (HCAN), which is based on generalized variational auto-encoders for heterogeneous data with multiple types of entities. HCAN jointly learns latent embeddings for both nodes and attributes in a unified hyperspherical space such that the affinities between nodes and attributes can be captured effectively. We argue that this is a crucial feature in many real-world applications of attributed networks. Previous Gaussian network embedding algorithms break the assumption of uninformative prior, which leads to unstable results and poor performance. In contrast, HCAN embeds nodes and attributes as von Mises-Fisher distributions, and allows one to capture the uncertainty of the inferred representations. Experimental results on eight datasets show that HCAN yields better performance in a number of applications compared with nine state-of-the-art baselines. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
24. Fidelity, Soundness, and Efficiency of Interleaved Comparison Methods.
- Author
-
HOFMANN, KATJA, WHITESON, SHIMON, and DE RIJKE, MAARTEN
- Subjects
- *
SEARCH engines , *LEARNING , *QUERY languages (Computer science) , *CODING theory , *LOYALTY - Abstract
Ranker evaluation is central to the research into search engines, be it to compare rankers or to provide feedback for learning to rank. Traditional evaluation approaches do not scale well because they require explicit relevance judgments of document-query pairs, which are expensive to obtain. A promising alternative is the use of interleaved comparison methods, which compare rankers using click data obtained when interleaving their rankings. In this article, we propose a framework for analyzing interleaved comparison methods. An interleaved comparison method has fidelity if the expected outcome of ranker comparisons properly corresponds to the true relevance of the ranked documents. It is sound if its estimates of that expected outcome are unbiased and consistent. It is efficient if those estimates are accurate with only little data. We analyze existing interleaved comparison methods and find that, while sound, none meet our criteria for fidelity. We propose a probabilistic interleave method, which is sound and has fidelity. We show empirically that, by marginalizing out variables that are known, it is more efficient than existing interleaved comparison methods. Using importance sampling we derive a sound extension that is able to reuse historical data collected in previous comparisons of other ranker pairs. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
25. Multi-interest Diversification for End-to-end Sequential Recommendation.
- Author
-
WANYU CHEN, PENGJIE REN, FEI CAI, FEI SUN, and DE RIJKE, MAARTEN
- Subjects
- *
MAXIMUM entropy method , *COMPUTATIONAL complexity , *ENTROPY , *LATENT semantic analysis - Abstract
A preliminary version of this article appeared in the proceedings of CIKM 2020 [10]. In this extension, we (1) propose another interest extractor, i.e., dynamic routing, in the implicit interest mining module, and another type of disagreement regularization, i.e., output disagreement regularization, in our interest-aware, diversity promoting loss; (2) investigate the performance of our multi-interest, diversified, sequential recommendation model with different interest extractors in implicit interest mining, i.e., multi-head attention vs. dynamic routing; (3) investigate the performance of multi-interest, diversified, sequential recommendation with various latent interests numbers; (4) explore the influence of the parameter λ on the performance of multi-interest, diversified, sequential recommendation; (5) investigate the performance of multiinterest, diversified, sequential recommendation with different types of disagreement regularization; (6) investigate the impact of maximum entropy regularization on the performance of multi-interest, diversified, sequential recommendation; (7) provide a case study to show the recommendations generated by multi-interest, diversified, sequential recommendation; (8) analyze the computational complexity of the baseline model as well as our proposal; and (9) survey more related work and conduct a more detailed analysis of the approach and experimental results. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
26. Result diversification based on query-specific cluster ranking.
- Author
-
Jiyin He, Meij, Edgar, and de Rijke, Maarten
- Subjects
- *
QUERYING (Computer science) , *CLUSTER analysis (Statistics) , *WEB search engines , *INFORMATION science , *ALGORITHMS , *DATA analysis , *COHESION (Linguistics) , *PRECISION (Information retrieval) , *INFORMATION processing - Abstract
Result diversification is a retrieval strategy for dealing with ambiguous or multi-faceted queries by providing documents that cover as many facets of the query as possible. We propose a result diversification framework based on query-specific clustering and cluster ranking, in which diversification is restricted to documents belonging to clusters that potentially contain a high percentage of relevant documents. Empirical results show that the proposed framework improves the performance of several existing diversification methods. The framework also gives rise to a simple yet effective cluster-based approach to result diversification that selects documents from different clusters to be included in a ranked list in a round robin fashion. We describe a set of experiments aimed at thoroughly analyzing the behavior of the two main components of the proposed diversification framework, ranking and selecting clusters for diversification. Both components have a crucial impact on the overall performance of our framework, but ranking clusters plays a more important role than selecting clusters. We also examine properties that clusters should have in order for our diversification framework to be effective. Most relevant documents should be contained in a small number of high-quality clusters, while there should be no dominantly large clusters. Also, documents from these high-quality clusters should have a diverse content. These properties are strongly correlated with the overall performance of the proposed diversification framework. [ABSTRACT FROM AUTHOR]
- Published
- 2011
- Full Text
- View/download PDF
27. Predicting podcast preference: An analysis framework and its application.
- Author
-
Tsagkias, Manos, Larson, Martha, and de Rijke, Maarten
- Subjects
- *
SOUND recordings , *ITUNES (Digital music program) , *LISTENING , *SURVEYS , *PODCASTING , *MUSIC software , *DIGITAL jukebox software , *DATA analysis - Abstract
Finding worthwhile podcasts can be difficult for listeners since podcasts are published in large numbers and vary widely with respect to quality and repute. Independently of their informational content, certain podcasts provide satisfying listening material while other podcasts have little or no appeal. In this paper we present PodCred, a framework for analyzing listener appeal, and we demonstrate its application to the task of automatically predicting the listening preferences of users. First, we describe the PodCred framework, which consists of an inventory of factors contributing to user perceptions of the credibility and quality of podcasts. The framework is designed to support automatic prediction of whether or not a particular podcast will enjoy listener preference. It consists of four categories of indicators related to the Podcast Content, the Podcaster, the Podcast Context, and the Technical Execution of the podcast. Three studies contributed to the development of the PodCred framework: a review of the literature on credibility for other media, a survey of prescriptive guidelines for podcasting, and a detailed data analysis. Next, we report on a validation exercise in which the PodCred framework is applied to a real-world podcast preference prediction task. Our validation focuses on select framework indicators that show promise of being both discriminative and readily accessible. We translate these indicators into a set of easily extractable “surface” features and use them to implement a basic classification system. The experiments carried out to evaluate system use popularity levels in iTunes as ground truth and demonstrate that simple surface features derived from the PodCred framework are indeed useful for classifying podcasts. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
28. Articulating Information Needs in XML Query Languages.
- Author
-
Kamps, Jaap, Marx, Maarten, De Rijke, Maarten, and Sigurbjörnsson, Börkur
- Subjects
- *
XML (Extensible Markup Language) , *INFORMATION storage & retrieval systems , *DOCUMENT markup languages , *PROGRAMMING languages , *QUERY languages (Computer science) , *ARTIFICIAL languages , *COMPUTER programming , *COMPUTER software , *LANGUAGE & languages - Abstract
Document-centric XML is a mixture of text and structure. With the increased availability of document-centric XIVIL documents comes a need for query facilities in which both structural constraints and constraints on the content of the documents can be expressed. How does the expressiveness of languages for querying XML documents help users to express their information needs? We address this question from both an experimental and a theoretical point of view. Our experimental analysis compares a structure-ignorant with a structure-aware retrieval approach using the test suite of the INEX XML Retrieval Evaluation Initiative. Theoretically, we create two mathematical models of users' knowledge of a set of documents and define query languages which exactly fit these models. One of these languages corresponds to an XML version of fielded search, the other to the INEX query language. Our main experimental findings are: First, while structure is used in varying degrees of complexity, two-thirds of the queries can be expressed in a fielded-search-like format which does not use the hierarchical structure of the documents. Second, three-quarters of the queries use constraints on the context of the elements to be returned; these contextual constraints cannot be captured by ordinary keyword queries. Third, structure is used as a search hint, and not as a strict requirement, when judged against the underlying information need. Fourth, the use of structure in queries functions as a precision enhancing device. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
29. Conversations with Search Engines: SERP-based Conversational Response Generation.
- Author
-
PENGJIE REN, ZHUMIN CHEN, ZHAOCHUN REN, KANOULAS, EVANGELOS, MONZ, CHRISTOF, and DE RIJKE, MAARTEN
- Subjects
- *
INFORMATION needs , *NATURAL languages , *QUESTION answering systems , *SEARCH engines , *CROWDSOURCING - Abstract
In this article, we address the problem of answering complex information needs by conducting conversations with search engines, in the sense that users can express their queries in natural language and directly receive the information they need from a short system response in a conversationalmanner. Recently, there have been some attempts towards a similar goal, e.g., studies on Conversational Agents (CAs) and Conversational Search (CS). However, they either do not address complex information needs in search scenarios or they are limited to the development of conceptual frameworks and/or laboratory-based user studies. We pursue two goals in this article: (1) the creation of a suitable dataset, the Search as a Conversation (SaaC) dataset, for the development of pipelines for conversations with search engines, and (2) the development of a state-of-the-art pipeline for conversations with search engines, Conversations with Search Engines (CaSE), using this dataset. SaaC is built based on a multi-turn conversational search dataset, where we further employ workers from a crowdsourcing platform to summarize each relevant passage into a short, conversational response. CaSE enhances the state-of-the-art by introducing a supporting token identification module and a prior-aware pointer generator, which enables us to generate more accurate responses. We carry out experiments to show that CaSE is able to outperform strong baselines. We also conduct extensive analyses on the SaaC dataset to show where there is room for further improvement beyond CaSE. Finally, we release the SaaC dataset and the code for CaSE and all models used for comparison to facilitate future research on this topic. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
30. Explainable Outfit Recommendation with Joint Outfit Matching and Comment Generation.
- Author
-
Lin, Yujie, Ren, Pengjie, Chen, Zhumin, Ren, Zhaochun, Ma, Jun, and de Rijke, Maarten
- Subjects
- *
CONVOLUTIONAL neural networks , *RECURRENT neural networks , *FORECASTING , *RECOMMENDER systems , *GENERATIONS - Abstract
Most previous work on outfit recommendation focuses on designing visual features to enhance recommendations. Existing work neglects user comments of fashion items, which have been proven to be effective in generating explanations along with better recommendation results. We propose a novel neural network framework, neural outfit recommendation (NOR), that simultaneously provides outfit recommendations and generates abstractive comments. Neural outfit recommendation (NOR) consists of two parts: outfit matching and comment generation. For outfit matching, we propose a convolutional neural network with a mutual attention mechanism to extract visual features. The visual features are then decoded into a rating score for the matching prediction. For abstractive comment generation, we propose a gated recurrent neural network with a cross-modality attention mechanism to transform visual features into a concise sentence. The two parts are jointly trained based on a multi-task learning framework in an end-to-end back-propagation paradigm. Extensive experiments conducted on an existing dataset and a collected real-world dataset show NOR achieves significant improvements over state-of-the-art baselines for outfit recommendation. Meanwhile, our generated comments achieve impressive ROUGE and BLEU scores in comparison to human-written comments. The generated comments can be regarded as explanations for the recommendation results. We release the dataset and code to facilitate future research. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
31. HiTR: Hierarchical Topic Model Re-Estimation for Measuring Topical Diversity of Documents.
- Author
-
Azarbonyad, Hosein, Dehghani, Mostafa, Kenter, Tom, Marx, Maarten, Kamps, Jaap, and de Rijke, Maarten
- Subjects
- *
INFORMATION commons , *TASK analysis , *FEATURE extraction - Abstract
A high degree of topical diversity is often considered to be an important characteristic of interesting text documents. A recent proposal for measuring topical diversity identifies three distributions for assessing the diversity of documents: distributions of words within documents, words within topics, and topics within documents. Topic models play a central role in this approach and, hence, their quality is crucial to the efficacy of measuring topical diversity. The quality of topic models is affected by two causes: generality and impurity of topics. General topics only include common information of a background corpus and are assigned to most of the documents. Impure topics contain words that are not related to the topic. Impurity lowers the interpretability of topic models. Impure topics are likely to get assigned to documents erroneously. We propose a hierarchical re-estimation process aimed at removing generality and impurity. Our approach has three re-estimation components: (1) document re-estimation, which removes general words from the documents; (2) topic re-estimation, which re-estimates the distribution over words of each topic; and (3) topic assignment re-estimation, which re-estimates for each document its distributions over topics. For measuring topical diversity of text documents, our HiTR approach improves over the state-of-the-art measured on PubMed dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
32. Joint Neural Collaborative Filtering for Recommender Systems.
- Author
-
WANYU CHEN, FEI CAI, HONGHUI CHEN, and DE RIJKE, MAARTEN
- Subjects
- *
RECOMMENDER systems , *COST functions , *FILTERS & filtration - Abstract
We propose a Joint Neural Collaborative Filtering (J-NCF) method for recommender systems. The J-NCF model applies a joint neural network that couples deep feature learning and deep interaction modeling with a rating matrix. Deep feature learning extracts feature representations of users and items with a deep learning architecture based on a user-item rating matrix. Deep interaction modeling captures non-linear user-item interactions with a deep neural network using the feature representations generated by the deep feature learning process as input. J-NCF enables the deep feature learning and deep interaction modeling processes to optimize each other through joint training, which leads to improved recommendation performance. In addition, we design a new loss function for optimization that takes both implicit and explicit feedback, pointwise and pair-wise loss into account. Experiments on several real-world datasets show significant improvements of J-NCF over state-of-the-art methods, with improvements of up to 8.24% on the MovieLens 100K dataset, 10.81% on the MovieLens 1M dataset, and 10.21% on the Amazon Movies dataset in terms of HR@10. NDCG@10 improvements are 12.42%, 14.24%, and 15.06%, respectively. We also conduct experiments to evaluate the scalability and sensitivity of J-NCF. Our experiments show that the J-NCF model has a competitive recommendation performance with inactive users and different degrees of data sparsity when compared to state-of-the-art baselines. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
33. Personalised Reranking of Paper Recommendations Using Paper Content and User Behavior.
- Author
-
XINYI LI, YIFAN CHEN, PETTIT, BENJAMIN, and DE RIJKE, MAARTEN
- Subjects
- *
SEARCH engines , *EMAIL , *NEWSLETTERS , *RESEARCH , *MANUFACTURING processes - Abstract
Academic search engines have been widely used to access academic papers, where users' information needs are explicitly represented as search queries. Some modern recommender systems have taken one step further by predicting users' information needswithout the presence of an explicit query. In this article,we examine an academic paper recommender that sends out paper recommendations in email newsletters, based on the users' browsing history on the academic search engine. Specifically, we look at users who regularly browse papers on the search engine, and we sign up for the recommendation newsletters for the first time. We address the task of reranking the recommendation candidates that are generated by a production system for such users. We face the challenge that the users on whom we focus have not interacted with the recommender system before, which is a common scenario that every recommender system encounters when new users sign up. We propose an approach to reranking candidate recommendations that utilizes both paper content and user behavior. The approach is designed to suit the characteristics unique to our academic recommendation setting. For instance, content similarity measures can be used to find the closest match between candidate recommendations and the papers previously browsed by the user. To this end, we use a knowledge graph derived from paper metadata to compare entity similarities (papers, authors, and journals) in the embedding space. Since the users on whom we focus have no prior interactions with the recommender system, we propose a model to learn a mapping from users' browsed articles to user clicks on the recommendations. We combine both content and behavior into a hybrid reranking model that outperforms the production baseline significantly, providing a relative 13% increase in Mean Average Precision and 28% in Precision@1. Moreover, we provide a detailed analysis of the model components, highlighting where the performance boost comes from. The obtained insights reveal useful components for the reranking process and can be generalized to other academic recommendation settings as well, such as the utility of graph embedding similarity. Also, recent papers browsed by users provide stronger evidence for recommendation than historical ones. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
34. Linear feature extraction for ranking.
- Author
-
Pandey, Gaurav, Ren, Zhaochun, Wang, Shuaiqiang, Veijalainen, Jari, and de Rijke, Maarten
- Subjects
- *
INFORMATION retrieval , *FEATURE extraction , *INFORMATION resources management , *INFORMATION science , *ALGORITHMS - Abstract
We address the feature extraction problem for document ranking in information retrieval. We then propose LifeRank, a Linear feature extraction algorithm for Ranking. In LifeRank, we regard each document collection for ranking as a matrix, referred to as the original matrix. We try to optimize a transformation matrix, so that a new matrix (dataset) can be generated as the product of the original matrix and a transformation matrix. The transformation matrix projects high-dimensional document vectors into lower dimensions. Theoretically, there could be very large transformation matrices, each leading to a new generated matrix. In LifeRank, we produce a transformation matrix so that the generated new matrix can match the learning to rank problem. Extensive experiments on benchmark datasets show the performance gains of LifeRank in comparison with state-of-the-art feature selection algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
35. Sentence Relations for Extractive Summarization with Deep Neural Networks.
- Author
-
PENGJIE REN, ZHUMIN CHEN, ZHAOCHUN REN, FURU WEI, LIQIANG NIE, JUN MA, and DE RIJKE, MAARTEN
- Subjects
- *
ARTIFICIAL neural networks , *DATABASES , *ALGORITHMS , *DATA analysis , *SEMANTICS - Abstract
Sentence regression is a type of extractive summarization that achieves state-of-the-art performance and is commonly used in practical systems. The most challenging task within the sentence regression framework is to identify discriminative features to represent each sentence. In this article, we study the use of sentence relations, e.g., Contextual Sentence Relations (CSR), Title Sentence Relations (TSR), and Query Sentence Relations (QSR), so as to improve the performance of sentence regression. CSR, TSR, and QSR refer to the relations between a main body sentence and its local context, its document title, and a given query, respectively. We propose a deep neural network model, Sentence Relation-based Summarization (SRSum), that consists of five sub-models, PriorSum, CSRSum, TSRSum, QSRSum, and SFSum. PriorSum encodes the latent semantic meaning of a sentence using a bi-gram convolutional neural network. SFSum encodes the surface information of a sentence, e.g., sentence length, sentence position, and so on. CSRSum, TSRSum, and QSRSum are three sentence relation sub-models corresponding to CSR, TSR, and QSR, respectively. CSRSum evaluates the ability of each sentence to summarize its local contexts. Specifically, CSRSum applies a CSR-based word-level and sentence-level attention mechanism to simulate the context-aware reading of a human reader, where words and sentences that have anaphoric relations or local summarization abilities are easily remembered and paid attention to. TSRSum evaluates the semantic closeness of each sentence with respect to its title, which usually reflects the main ideas of a document. TSRSum applies a TSR-based attentionmechanism to simulate people's reading abilitywith the main idea (title) in mind. QSRSum evaluates the relevance of each sentencewith given queries for the query-focused summarization. QSRSum applies a QSR-based attention mechanism to simulate the attentive reading of a human reader with some queries in mind. The mechanism can recognizewhich parts of the given queries are more likely answered by a sentence under consideration. Finally as a whole, SRSum automatically learns useful latent features by jointly learning representations of query sentences, content sentences, and title sentences as well as their relations. We conduct extensive experiments on six benchmark datasets, including generic multi-document summarization and query-focused multi-document summarization. On both tasks, SRSum achieves comparable or superior performance compared with state-of-the-art approaches in terms of multiple ROUGE metrics. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
36. Neural information retrieval: at the end of the early years.
- Author
-
Onal, Kezban Dilek, Zhang, Ye, Altingovde, Ismail Sengor, Rahman, Md Mustafizur, Karagoz, Pinar, Braylan, Alex, Dang, Brandon, Chang, Heng-Lu, Kim, Henna, McNamara, Quinten, Angert, Aaron, Banner, Edward, Khetan, Vivek, McDonnell, Tyler, Nguyen, An Thanh, Xu, Dan, Wallace, Byron C., de Rijke, Maarten, and Lease, Matthew
- Subjects
- *
ARTIFICIAL neural networks , *INFORMATION retrieval , *DEEP learning , *NEUROSCIENCES , *INFORMATION storage & retrieval systems - Abstract
A recent “third wave” of neural network (NN) approaches now delivers state-of-the-art performance in many machine learning tasks, spanning speech recognition, computer vision, and natural language processing. Because these modern NNs often comprise multiple interconnected layers, work in this area is often referred to as
deep learning . Recent years have witnessed an explosive growth of research into NN-based approaches to information retrieval (IR). A significant body of work has now been created. In this paper, we survey the current landscape ofNeural IR research, paying special attention to the use of learned distributed representations of textual units. We highlight the successes of neural IR thus far, catalog obstacles to its wider adoption, and suggest potentially promising directions for future research. [ABSTRACT FROM AUTHOR]- Published
- 2018
- Full Text
- View/download PDF
37. Efficient Structured Learning for Personalized Diversification.
- Author
-
Liang, Shangsong, Cai, Fei, Ren, Zhaochun, and De Rijke, Maarten
- Subjects
- *
SUPERVISED learning , *WEB personalization , *DIRICHLET principle , *AD hoc computer networks , *WEB search engines , *PREDICTION theory - Abstract
This paper is concerned with the problem of personalized diversification of search results, with the goal of enhancing the performance of both plain diversification and plain personalization algorithms. In previous work, the problem has mainly been tackled by means of unsupervised learning. To further enhance the performance, we propose a supervised learning strategy. Specifically, we set up a structured learning framework for conducting supervised personalized diversification, in which we add features extracted directly from tokens of documents and those utilized by unsupervised personalized diversification algorithms, and, importantly, those generated from our proposed user-interest latent Dirichlet topic model. We also define two constraints in our structured learning framework to ensure that search results are both diversified and consistent with a user's interest. To further boost the efficiency of training, we propose a fast training framework for our proposed method by adding additional multiple highly violated but also diversified constraints at every training iteration of the cutting-plane algorithm. We conduct experiments on an open dataset and find that our supervised learning strategy outperforms unsupervised personalized diversification methods as well as other plain personalization and plain diversification methods. Our fast training framework significantly saves training time while it maintains almost the same performance. [ABSTRACT FROM PUBLISHER]
- Published
- 2016
- Full Text
- View/download PDF
38. A Comparative Analysis of Interleaving Methods for Aggregated Search.
- Author
-
CHUKLIN, ALEKSANDR, SCHUTH, ANNE, KE ZHOU, and DE RIJKE, MAARTEN
- Subjects
- *
CODING theory , *DATA compression (Telecommunication) , *DIGITAL electronics , *INFORMATION storage & retrieval systems , *ELECTRONIC information resources - Abstract
A result page of a modern search engine often goes beyond a simple list of "10 blue links." Many specific user needs (e.g., News, Image, Video) are addressed by so-called aggregated or vertical search solutions: specially presented documents, often retrieved from specific sources, that stand out from the regular organic Web search results. When it comes to evaluating ranking systems, such complex result layouts raise their own challenges. This is especially true for so-called interleaving methods that have arisen as an important type of online evaluation: by mixing results from two different result pages, interleaving can easily break the desired Web layout in which vertical documents are grouped together, and hence hurt the user experience. We conduct an analysis of different interleaving methods as applied to aggregated search engine result pages. Apart from conventional interleaving methods, we propose two vertical-aware methods: one derived from the widely used Team-Draft Interleaving method by adjusting it in such a way that it respects vertical document groupings, and another based on the recently introduced Optimized Interleaving framework. We show that our proposed methods are better at preserving the user experience than existing interleaving methods while still performing well as a tool for comparing ranking systems. For evaluating our proposed vertical-aware interleaving methods, we use real-world click data as well as simulated clicks and simulated ranking systems. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
39. Bayesian feature interaction selection for factorization machines.
- Author
-
Chen, Yifan, Wang, Yang, Ren, Pengjie, Wang, Meng, and de Rijke, Maarten
- Subjects
- *
FEATURE selection , *ARTIFICIAL intelligence , *FACTORIZATION , *MACHINERY - Abstract
Factorization machines are a generic supervised method for a wide range of tasks in the field of artificial intelligence, such as prediction, inference, etc., which can effectively model feature interactions. However, handling combinations of features is expensive due to the exponential growth of feature interactions with the order. In nature, not all feature interactions are equally useful for prediction. Recently, a large number of methods that perform feature interaction selection have attracted great attention because of their effectiveness at filtering out useless feature interactions. Current feature interaction selection methods suffered from the following limitations: (1) they assume that all users share the same feature interactions; and (2) they select pairwise feature interactions only. In this paper, we propose novel Bayesian variable selection methods, targeting feature interaction selection for factorization machines, which effectively reduce the number of interactions. We study personalized feature interaction selection to account for individual preferences, and further extend the model to investigate higher-order feature interaction selection on higher-order factorization machines. We provide empirical evidence for the advantages of the proposed Bayesian feature interaction selection methods using different prediction tasks. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
40. Query modeling for entity search based on terms, categories, and examples.
- Author
-
Balog, Krisztian, Bron, Marc, and De Rijke, Maarten
- Subjects
- *
QUERY (Information retrieval system) , *QUERYING (Computer science) , *DATABASE searching , *INFORMATION storage & retrieval systems , *COMPUTER systems , *INFORMATION processing - Abstract
Users often search for entities instead of documents, and in this setting, are willing to provide extra input, in addition to a series of query terms, such as category information and example entities. We propose a general probabilistic framework for entity search to evaluate and provide insights in the many ways of using these types of input for query modeling. We focus on the use of category information and show the advantage of a category-based representation over a term-based representation, and also demonstrate the effectiveness of category-based expansion using example entities. Our best performing model shows very competitive performance on the INEX-XER entity ranking and list completion tasks. [ABSTRACT FROM AUTHOR]
- Published
- 2011
- Full Text
- View/download PDF
41. Search behavior of media professionals at an audiovisual archive: A transaction log analysis.
- Author
-
Huurnink, Bouke, Hollink, Laura, van den Heuvel, Wietske, and de Rijke, Maarten
- Subjects
- *
MASS media employees , *AUDIOVISUAL archives , *BROADCASTING industry , *AUDIOVISUAL materials , *MASS media , *BROADCASTERS , *DOCUMENTARY filmmakers , *TELEVISION producers & directors , *LOGBOOKS - Abstract
Finding audiovisual material for reuse in new programs is an important activity for news producers, documentary makers, and other media professionals. Such professionals are typically served by an audiovisual broadcast archive. We report on a study of the transaction logs of one such archive. The analysis includes an investigation of commercial orders made by the media professionals and a characterization of sessions, queries, and the content of terms recorded in the logs. One of our key findings is that there is a strong demand for short pieces of audiovisual material in the archive. In addition, while searchers are generally able to quickly navigate to a usable audiovisual broadcast, it takes them longer to place an order when purchasing a subsection of a broadcast than when purchasing an entire broadcast. Another key finding is that queries predominantly consist of (parts of) broadcast titles and of proper names. Our observations imply that it may be beneficial to increase support for fine-grained access to audiovisual material, for example, through manual segmentation or content-based analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
42. Contextual factors for finding similar experts.
- Author
-
Hofmann, Katja, Balog, Krisztian, Bogers, Toine, and de Rijke, Maarten
- Subjects
- *
EXPERT systems , *ALGORITHMS , *INFORMATION retrieval , *DOCUMENTATION , *SEARCH engines , *INFORMATION resources management , *INFORMATION storage & retrieval systems , *ORGANIZATIONAL structure , *CONTENT mining - Abstract
Expertise-seeking research studies how people search for expertise and choose whom to contact in the context of a specific task. An important outcome are models that identify factors that influence expert finding. Expertise retrieval addresses the same problem, expert finding, but from a system-centered perspective. The main focus has been on developing content-based algorithms similar to document search. These algorithms identify matching experts primarily on the basis of the textual content of documents with which experts are associated. Other factors, such as the ones identified by expertise-seeking models, are rarely taken into account. In this article, we extend content-based expert-finding approaches with contextual factors that have been found to influence human expert finding. We focus on a task of science communicators in a knowledge-intensive environment, the task of finding similar experts, given an example expert. Our approach combines expertise-seeking and retrieval research. First, we conduct a user study to identify contextual factors that may play a role in the studied task and environment. Then, we design expert retrieval models to capture these factors. We combine these with content-based retrieval models and evaluate them in a retrieval experiment. Our main finding is that while content-based features are the most important, human participants also take contextual factors into account, such as media experience and organizational structure. We develop two principled ways of modeling the identified factors and integrate them with content-based retrieval models. Our experiments show that models combining content-based and contextual factors can significantly outperform existing content-based models. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
43. A Multidisciplinary Approach to Unlocking Television Broadcast Archives.
- Author
-
HOLLINK, LAURA, SCHREIBER, GUUS, HUURNINK, BOUKE, LIEMPT, MICHIEL VAN, DE RIJKE, MAARTEN, SMEULDERS, ARNOLD, OOMEN, JOHAN, and DE JONG, ANNEMIEKE
- Subjects
- *
TELEVISION archives , *BROADCASTING archives , *TELEVISION broadcasting , *INFORMATION retrieval , *ARCHIVAL research , *MULTIMEDIA systems - Abstract
Audiovisual material is a vital component of the world's heritage but it remains difficult to access. With the Netherlands Institute for Sound and Vision as one of its partners, the MuNCH project aims to investigate new methods for improving access to a wide range of audiovisual documents. MuNCH brings together three research fields: multimedia analysis, language technology and semantic technologies. Within the MuNCH project, we have investigated several combinations of these fields. We have compared text matching, ontology querying, and semantic visual querying as methods to translate a multimedia query to the vocabulary of the retrieval system. In addition, we have investigated how users make such a translation, and have used this as a benchmark to create automatic methods. We have used multimedia technology to automatically detect objects and scenes as they occur in video, and made use of language technology to exploit automatic transcriptions of speech. We have enriched the Sound and Vision thesaurus that is used to annotate the TV programmes in order to provide a user with a wider range of search results. In order to verify the results of the project against real user needs, MuNCH has participated in the creation of a logging system which monitors the usage of the Sound and Vision catalogue system. Insights in the needs of real users will be used as input for all three of MuNCH's research strands. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
44. Monolingual Document Retrieval for European Languages.
- Author
-
Hollink, Vera, Kamps, Jaap, Monz, Christof, and de Rijke, Maarten
- Subjects
- *
INFORMATION retrieval , *INFORMATION science , *INFORMATION resources management , *DOCUMENTATION , *INFORMATION services , *SEARCH engines - Abstract
Recent years have witnessed considerable advances in information retrieval for European languages other than English. We give an overview of commonly used techniques and we analyze them with respect to their impact on retrieval effectiveness. The techniques considered range from linguistically motivated techniques, such as morphological normalization and compound splitting, to knowledge-free approaches, such as n-gram indexing. Evaluations are carried out against data from the CLEF campaign, covering eight European languages. Our results show that for many of these languages a modicum of linguistic techniques may lead to improvements in retrieval effectiveness, as can the use of language independent techniques. [ABSTRACT FROM AUTHOR]
- Published
- 2004
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.