Descriptor: "Text Retrieval Conference" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Text Retrieval Conference"' showing total 278 results

Start Over Descriptor "Text Retrieval Conference"

278 results on '"Text Retrieval Conference"'

1. An intrinsic evaluation of the Waterloo spam rankings of the ClueWeb09 and ClueWeb12 datasets.

Author: Yılmazel, İbrahim Barış and Arslan, Ahmet
Subjects: *SPAM email, *WEBSITES, *DISTRIBUTION (Probability theory), *APPRAISERS
Abstract: The ClueWeb09 dataset and its successor, the ClueWeb12 dataset, are two of the largest collections of Web pages released by Text REtrieval Conference (TREC). The ClueWeb datasets were used in various tracks of TREC ran through 2009 to 2017. For every year, approximately 50 new queries are released and a pool of Web pages are judged against these queries by human assessors as relevant, non-relevant or spam. In this article, a ground truth for binary classification (spam vs non-spam) is constructed from Web pages that are judged as spam or relevant under the assumption that a Web page judged as relevant for any query cannot be spam. Based on this ground truth, we evaluate classification performances of the Waterloo spam rankings (Fusion, Britney, GroupX and UK2006), which have been traditionally used to identify and filter spam pages in retrieval systems. The experimental results in terms of the universal binary classification evaluation measures suggest that the Fusion (with threshold = 11%) is the best for the ClueWeb09 dataset. Analysis of the frequency distributions of relevant/spam documents over spam scores reveals that the GroupX is the most powerful at identifying relevant documents, whereas the Fusion is the most powerful at identifying spam documents. It is also confirmed that the effectiveness of the Fusion spam ranking of the ClueWeb12 dataset is not as good as that of the ClueWeb09. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

2. Indian School of Mines at INEX 2011 Snippet Retrieval Task

Author: Pal, Sukomal, Tamrakar, Preeti, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Geva, Shlomo, editor, Kamps, Jaap, editor, and Schenkel, Ralf, editor
Published: 2012
Full Text: View/download PDF

3. A New Approach of Intelligent Data Retrieval Paradigm

Author: Falah Hassan Ali Al-akashi and Diana Inkpen
Subjects: Scheme (programming language), Information retrieval, Computer science, Rank (computer programming), General Medicine, computer.software_genre, Intelligent agent, Data retrieval, Ranking, Vector space model, Linear combination, computer, Text Retrieval Conference, computer.programming_language
Abstract: What is a real time agent, how does it remedy ongoing daily frustrations for users, and how does it improve the retrieval performance in World Wide Web? These are the main question we focus on this manuscript. In many distributed information retrieval systems, information in agents should be ranked based on a combination of multiple criteria. Linear combination of ranks has been the dominant approach due to its simplicity and effectiveness. Such a combination scheme in distributed infrastructure requires that the ranks in resources or agents are comparable to each other before combined. The main challenge is transforming the raw rank values of different criteria appropriately to make them comparable before any combination. Different ways for ranking agents make this strategy difficult. In this research, we will demonstrate how to rank Web documents based on resource-provided information how to combine several resources raking schemas in one time. The proposed system was implemented specifically in data provided by agents to create a comparable combination for different attributes. The proposed approach was tested on the queries provided by Text Retrieval Conference (TREC). Experimental results showed that our approach is effective and robust compared with offline search platforms.
Published: 2021
Full Text: View/download PDF

4. A Principled Approach Using Fuzzy Set Theory for Passage-Based Document Retrieval

Author: Edward Kai Fung Dang, Robert W. P. Luk, and James Allan
Subjects: Normalization (statistics), Operationalization, Computer science, business.industry, Applied Mathematics, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, Fuzzy set, computer.software_genre, Semantics, Computational Theory and Mathematics, Artificial Intelligence, Control and Systems Engineering, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, Relevance (information retrieval), Artificial intelligence, Document retrieval, business, Heuristics, Text Retrieval Conference, computer, Natural language processing
Abstract: In this article, we present a novel principled approach to passage-based (document) retrieval using fuzzy set theory. The approach formulates passage score combination according to general relevance decision principles. By operationalizing these principles using aggregation operators of fuzzy set theory, our approach justifies the common heuristics of taking the maximum constituent passage score as the overall document score. Experiments show that this heuristics is only the near best, with some fuzzy set aggregation operators stipulated in our approach being better methods. The significance of our principled approach is the applicability of many passage score combination methods, potentially bringing further performance enhancement. Experiments on several text retrieval conference collections demonstrate that our approach performs significantly better than document-based retrieval. While recent works in the literature mostly employ document-based rather than passage-based retrieval due to the common conception that document length normalization solves the problem of varying document lengths, our results show that document length normalization alone is not sufficient, especially in pseudo-relevance feedback retrieval.
Published: 2021
Full Text: View/download PDF

5. Karen Spärck Jones and Summarization

Author: Maybury, Mark T., Croft, W. Bruce, editor, and Tait, John I., editor
Published: 2005
Full Text: View/download PDF

6. What makes a query temporally sensitive?

Author: Willis, Craig, Sherman, Garrick, and Efron, Miles
Subjects: *INFORMATION needs, *INFORMATION retrieval, *SEARCH engines, *INFORMATION-seeking behavior, TEXT Retrieval Conference
Abstract: ABSTRACT This work examines factors that affect manual classifications of 'temporally sensitive' information needs. We introduce the concepts of temporal relevance and temporal topicality to differentiate between different aspects of temporal retrieval research. We use qualitative and quantitative techniques to analyze 660 topics from the Text Retrieval Conference (TREC) previously used in the experimental evaluation of temporal retrieval models. We use regression analysis to model previous manual classifications. We identify factors and potential problems with previous classifications, proposing principles and guidelines for future work on the evaluation of temporal retrieval models. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

7. Estimating reliability of the retrieval systems effectiveness rank based on performance in multiple experiments.

Author: Zhang, Shuxiang and Ravana, Sri
Subjects: *INFORMATION retrieval, *INFORMATION retrieval software, *INFORMATION retrieval standards, *INFORMATION storage & retrieval systems, TEXT Retrieval Conference
Abstract: For decades, the use of test collection has been a standardized approach in information retrieval evaluation. However, given the intrinsic nature of its construction, this approach has a number of limitations, such as bias in pooling, disagreement between human assessors, different levels of difficulty of topics, and performance constraints of the evaluation metrics. Any of these factors may distort the results of the relative effectiveness of different retrieval strategies, or rather the retrieval systems and thus result in unreliable system rankings. In this study, we have suggested techniques in estimating the reliability of the retrieval system effectiveness rank based on rankings from multiple experiments. These rankings may be from previous experimental results or rankings generated by conducting multiple experiments using smaller number of topics. These techniques will assist in precisely predicting the performance of each system in future experiments. To validate the proposed rank reliability estimation methods, two alternative systems ranking methods are proposed to generate new system rankings. The experimentation shows that system rank correlation coefficient values mostly remain above 0.8 against the gold standard. On top of that, the proposed techniques have generated system rankings that are more reliable than the baseline [traditional system ranking techniques used in text retrieval conference (TREC)-like initiatives]. The results from both TREC-2004 and TREC-8 show the same outcome which further confirms the effectiveness of the proposed rank reliability estimation method. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

8. Knowledge attention sandwich neural network for text classification

Author: Jianyu Zhao, Qichuan Yang, Zifeng Hou, Changjian Hu, Yang Zhang, and Zhiqiang Zhan
Subjects: 0209 industrial biotechnology, Parsing, Artificial neural network, Generalization, Computer science, business.industry, Cognitive Neuroscience, Treebank, 02 engineering and technology, computer.software_genre, Machine learning, Computer Science Applications, Reduction (complexity), 020901 industrial engineering & automation, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Adaptive learning, Artificial intelligence, Representation (mathematics), business, Text Retrieval Conference, computer
Abstract: Text Classification is a fundamental and crucial issue in many Natural Language Processing (NLP) tasks. An effective and efficient representation model is the key to text classification. However, most existing representation models either learn insufficient structural information or just rely on pre-defined structures, resulting in degradation of performance and generalization capability. We propose a novel Sandwich Neural Network (SNN), which is able to learn local semantic and global structural representations automatically without parsers. To combine semantic and structural representations sensibly, we propose four fusion methods incorporated with SNN: Static Fusion, Adaptive Learning, Self-Attention, and Knowledge Attention methods. Static Fusion weights semantic and structural representations equally, Adaptive Learning learns the weights at corpus level, and Self-Attention learns the weights at instance level. More importantly, within Knowledge Attention fusion method, external semantic and structural knowledge are incorporated into SNN to improve the attention procedure and further boost the performance of SNN. Evaluated with four mainstream datasets: Text REtrieval Conference (TREC), SUBJectivity (SUBJ), Movie Reviews (MR) and Stanford Sentiment Treebank with Five Labels (SST-5), the proposed Knowledge Attention Sandwich Neural Network(KA-SNN) model achieves very competitive performance. Specifically, the accuracies achieve 96.2% (TREC), 94.1% (SUBJ), 82.3% (MR) and 51.5% (SST-5). Moreover, the proposed Knowledge Attention reduces the structural complexity of attention module by 77.66-81.43% with a computing time reduction of 21.47-34.05%, compared with Self-Attention fusion method.
Published: 2020
Full Text: View/download PDF

9. Corpus-Level End-to-End Exploration for Interactive Systems

Author: Zhiwen Tang and Grace Hui Yang
Subjects: FOS: Computer and information sciences, Text corpus, Computer Science - Artificial Intelligence, Computer science, business.industry, Process (engineering), media_common.quotation_subject, General Medicine, Computer Science - Information Retrieval, Task (project management), Domain (software engineering), Search engine, Artificial Intelligence (cs.AI), Reinforcement learning, Artificial intelligence, Function (engineering), business, Text Retrieval Conference, Information Retrieval (cs.IR), media_common
Abstract: A core interest in building Artificial Intelligence (AI) agents is to let them interact with and assist humans. One example is Dynamic Search (DS), which models the process that a human works with a search engine agent to accomplish a complex and goal-oriented task. Early DS agents using Reinforcement Learning (RL) have only achieved limited success for (1) their lack of direct control over which documents to return and (2) the difficulty to recover from wrong search trajectories. In this paper, we present a novel corpus-level end-to-end exploration (CE3) method to address these issues. In our method, an entire text corpus is compressed into a global low-dimensional representation, which enables the agent to gain access to the full state and action spaces, including the under-explored areas. We also propose a new form of retrieval function, whose linear approximation allows end-to-end manipulation of documents. Experiments on the Text REtrieval Conference (TREC) Dynamic Domain (DD) Track show that CE3 outperforms the state-of-the-art DS systems., Comment: Accepted into AAAI 2020
Published: 2020
Full Text: View/download PDF

10. Sentence Retrieval using Stemming and Lemmatization with Different Length of the Queries

Author: Alen Doko, Ivan Boban, and Sven Gotovac
Subjects: Data pre-processing, Physics and Astronomy (miscellaneous), Computer science, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, 02 engineering and technology, computer.software_genre, Lemmatization, Sentence retrieval, TF-ISF, Stemming, Management of Technology and Innovation, 0202 electrical engineering, electronic engineering, information engineering, Document retrieval, Engineering (miscellaneous), Text Retrieval Conference, Stop words, business.industry, Lemmatisation, Term (logic), 021001 nanoscience & nanotechnology, Focus (linguistics), Intelligente und verteilte Systeme, 020201 artificial intelligence & image processing, Artificial intelligence, Language model, Institut für Softwaretechnologie, 0210 nano-technology, business, computer, Sentence, Natural language processing
Abstract: In this paper we focus on Sentence retrieval which is similar to Document retrieval but with a smaller unit of retrieval. Using data pre- processing in document retrieval is generally considered useful. When it comes to sentence retrieval the situation is not that clear. In this paper we use TF-ISF (term frequency – inverse sentence frequency) method for sentence retrieval. As pre-processing steps, we use stop word removal and language modeling techniques: stemming and lemmatization. We also experiment with different query lengths. The results show that data pre-processing with stemming and lemmatization is useful with sentences retrieval as it is with document retrieval. Lemmatization produces better results with longer queries, while stemming shows worse results with longer queries. For the experiment we used data of the Text Retrieval Conference (TREC) novelty tracks.
Published: 2020
Full Text: View/download PDF

11. Exploiting salient semantic analysis for information retrieval.

Author: Luo, Jing, Meng, Bo, Quan, Changqin, and Tu, Xinhui
Subjects: INFORMATION retrieval, NATURAL language processing, TEXT Retrieval Conference, LANGUAGE & languages
Abstract: Recently, many Wikipedia-based methods have been proposed to improve the performance of different natural language processing (NLP) tasks, such as semantic relatedness computation, text classification and information retrieval. Among these methods, salient semantic analysis (SSA) has been proven to be an effective way to generate conceptual representation for words or documents. However, its feasibility and effectiveness in information retrieval is mostly unknown. In this paper, we study how to efficiently use SSA to improve the information retrieval performance, and propose a SSA-based retrieval method under the language model framework. First, SSA model is adopted to build conceptual representations for documents and queries. Then, these conceptual representations and the bag-of-words (BOW) representations can be used in combination to estimate the language models of queries and documents. The proposed method is evaluated on several standard text retrieval conference (TREC) collections. Experiment results on standard TREC collections show the proposed models consistently outperform the existing Wikipedia-based retrieval methods. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

12. Clinical-decision support based on medical literature: A complex network approach.

Author: Jiang, Jingchi, Zheng, Jichuan, Zhao, Chao, Su, Jia, Guan, Yi, and Yu, Qiubin
Subjects: *DECISION support systems, *GRAPH theory, *MEDICAL literature, *RANKING (Statistics), TEXT Retrieval Conference
Abstract: In making clinical decisions, clinicians often review medical literature to ensure the reliability of diagnosis, test, and treatment because the medical literature can answer clinical questions and assist clinicians making clinical decisions. Therefore, finding the appropriate literature is a critical problem for clinical-decision support (CDS). First, the present study employs search engines to retrieve relevant literature about patient records. However, the result of the traditional method is usually unsatisfactory. To improve the relevance of the retrieval result, a medical literature network (MLN) based on these retrieved papers is constructed. Then, we show that this MLN has small-world and scale-free properties of a complex network. According to the structural characteristics of the MLN, we adopt two methods to further identify the potential relevant literature in addition to the retrieved literature. By integrating these potential papers into the MLN, a more comprehensive MLN is built to answer the question of actual patient records. Furthermore, we propose a re-ranking model to sort all papers by relevance. We experimentally find that the re-ranking model can improve the normalized discounted cumulative gain of the results. As participants of the Text Retrieval Conference 2015, our clinical-decision method based on the MLN also yields higher scores than the medians in most topics and achieves the best scores for topics: #11 and #12. These research results indicate that our study can be used to effectively assist clinicians in making clinical decisions, and the MLN can facilitate the investigation of CDS. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

13. An intrinsic evaluation of the Waterloo spam rankings of the ClueWeb09 and ClueWeb12 datasets

Author: İbrahim Barış Yılmazel and Ahmet Arslan
Subjects: Successor cardinal, Information retrieval, Computer science, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, 05 social sciences, Web retrieval, 02 engineering and technology, Library and Information Sciences, Spamdexing, ComputingMethodologies_PATTERNRECOGNITION, 020204 information systems, Web page, 0202 electrical engineering, electronic engineering, information engineering, 0509 other social sciences, 050904 information & library sciences, Text Retrieval Conference, Information Systems
Abstract: The ClueWeb09 dataset and its successor, the ClueWeb12 dataset, are two of the largest collections of Web pages released by Text REtrieval Conference (TREC). The ClueWeb datasets were used in various tracks of TREC ran through 2009 to 2017. For every year, approximately 50 new queries are released and a pool of Web pages are judged against these queries by human assessors as relevant, non-relevant or spam. In this article, a ground truth for binary classification (spam vs non-spam) is constructed from Web pages that are judged as spam or relevant under the assumption that a Web page judged as relevant for any query cannot be spam. Based on this ground truth, we evaluate classification performances of the Waterloo spam rankings (Fusion, Britney, GroupX and UK2006), which have been traditionally used to identify and filter spam pages in retrieval systems. The experimental results in terms of the universal binary classification evaluation measures suggest that the Fusion (with threshold = 11%) is the best for the ClueWeb09 dataset. Analysis of the frequency distributions of relevant/spam documents over spam scores reveals that the GroupX is the most powerful at identifying relevant documents, whereas the Fusion is the most powerful at identifying spam documents. It is also confirmed that the effectiveness of the Fusion spam ranking of the ClueWeb12 dataset is not as good as that of the ClueWeb09.
Published: 2019
Full Text: View/download PDF

14. ESLMT: a new clustering method for biomedical document retrieval

Author: Fatemeh Serpush and Mohammad Reza Keyvanpour
Subjects: 0303 health sciences, Thesaurus (information retrieval), Models, Statistical, Information retrieval, Computer science, Process (engineering), InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, Biomedical Engineering, MEDLINE, Information Storage and Retrieval, 02 engineering and technology, Biomedical Enhancement, 03 medical and health sciences, Resource (project management), Vocabulary, Controlled, 0202 electrical engineering, electronic engineering, information engineering, Cluster Analysis, Humans, 020201 artificial intelligence & image processing, Language model, Document retrieval, Cluster analysis, Text Retrieval Conference, 030304 developmental biology
Abstract: MEDLINE is a rapidly growing database; to utilize this resource, practitioners and biomedical researchers have dealt with tedious and time-consuming tasks such as discovering, searching, reading and evaluating of biomedical documents. However, making a label for a group of biomedical documents is expensive and needs a complicated operation. Otherwise, compound words, polysemous and synonymous problems can influence the search in MEDLINE. Therefore, designing an efficient way of sharing knowledge and information organization is essential so that information retrieval systems can provide ideal outcomes. For this purpose, different strategies are used in the retrieval of biomedical documents (RBD). However, still a number of unrelated results for the users’ query are obtained in the RBD process. Studies have shown that well-defined clusters in the retrieval system exhibit a more efficient performance in contrast to the document-based retrieval. Accordingly, the present study proposes the Expanding Statistical Language Modeling and Thesaurus (ESLMT) for clustering and retrieving biomedical documents. The results showed that Clustering with ESLM Similarity and Thesaurus (CESLMST) in all those criteria in this study have a higher value than the other compared methods. The results indicated that the mean average precision (MAP) has improved in the Clusters’ Retrieval Derived from ESLM Similarity-Query (CRDESLMS-QET) method in comparison to the previous methods with the Text REtrieval Conference (TREC) data set.
Published: 2019
Full Text: View/download PDF

15. A simple kernel co‐occurrence‐based enhancement for pseudo‐relevance feedback

Author: Tingting He, Jimmy Xiangji Huang, Mao Zhiming, Xinhui Tu, Zhiwei Ying, and Min Pan
Subjects: Information Systems and Management, Term Discrimination, Computer Networks and Communications, Computer science, 05 social sciences, Relevance feedback, 02 engineering and technology, Library and Information Sciences, computer.software_genre, Term (time), Data set, Query expansion, 020204 information systems, Kernel (statistics), 0202 electrical engineering, electronic engineering, information engineering, Relevance (information retrieval), Language model, Data mining, 0509 other social sciences, 050904 information & library sciences, computer, Text Retrieval Conference, Information Systems
Abstract: Pseudo‐relevance feedback is a well‐studied query expansion technique in which it is assumed that the top‐ranked documents in an initial set of retrieval results are relevant and expansion terms are then extracted from those documents. When selecting expansion terms, most traditional models do not simultaneously consider term frequency and the co‐occurrence relationships between candidate terms and query terms. Intuitively, however, a term that has a higher co‐occurrence with a query term is more likely to be related to the query topic. In this article, we propose a kernel co‐occurrence‐based framework to enhance retrieval performance by integrating term co‐occurrence information into the Rocchio model and a relevance language model (RM3). Specifically, a kernel co‐occurrence‐based Rocchio method (KRoc) and a kernel co‐occurrence‐based RM3 method (KRM3) are proposed. In our framework, co‐occurrence information is incorporated into both the factor of the term discrimination power and the factor of the within‐document term weight to boost retrieval performance. The results of a series of experiments show that our proposed methods significantly outperform the corresponding strong baselines over all data sets in terms of the mean average precision and over most data sets in terms of P@10. A direct comparison of standard Text Retrieval Conference data sets indicates that our proposed methods are at least comparable to state‐of‐the‐art approaches.
Published: 2019
Full Text: View/download PDF

16. Document Retrieval for Precision Medicine Using a Deep Learning Ensemble Method

Author: Zhiqiang Liu, Jingkun Feng, Zhihao Yang, and Lei Wang
Subjects: Matching (statistics), Boosting (machine learning), Computer science, precision medicine, Computer applications to medicine. Medical informatics, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, document ranking, R858-859.7, Health Informatics, 02 engineering and technology, Ranking (information retrieval), Query expansion, Health Information Management, biomedical information retrieval, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Relevance (information retrieval), Document retrieval, Text Retrieval Conference, Original Paper, Information retrieval, business.industry, Deep learning, deep learning, 020201 artificial intelligence & image processing, Artificial intelligence, business
Abstract: BackgroundWith the development of biomedicine, the number of biomedical documents has increased rapidly bringing a great challenge for researchers trying to retrieve the information they need. Information retrieval aims to meet this challenge by searching relevant documents from abundant documents based on the given query. However, sometimes the relevance of search results needs to be evaluated from multiple aspects in specific retrieval tasks, thereby increasing the difficulty of biomedical information retrieval.ObjectiveThis study aimed to find a more systematic method for retrieving relevant scientific literature for a given patient.MethodsIn the initial retrieval stage, we supplemented query terms through query expansion strategies and applied query boosting to obtain an initial ranking list of relevant documents. In the re-ranking phase, we employed a text classification model and relevance matching model to evaluate documents from different dimensions and then combined the outputs through logistic regression to re-rank all the documents from the initial ranking list.ResultsThe proposed ensemble method contributed to the improvement of biomedical retrieval performance. Compared with the existing deep learning–based methods, experimental results showed that our method achieved state-of-the-art performance on the data collection provided by the Text Retrieval Conference 2019 Precision Medicine Track.ConclusionsIn this paper, we proposed a novel ensemble method based on deep learning. As shown in the experiments, the strategies we used in the initial retrieval phase such as query expansion and query boosting are effective. The application of the text classification model and relevance matching model better captured semantic context information and improved retrieval performance.
Published: 2021

17. Incorporating Intra-Query Term Dependencies in an Aspect Query Language Model.

Author: Song, Dawei, Shi, Yanjie, Zhang, Peng, Huang, Qiang, Kruschwitz, Udo, Hou, Yuexian, and Wang, Bo
Subjects: *QUERY languages (Computer science), *INFORMATION retrieval software, *MARKOV processes, *QUERY (Information retrieval system), TEXT Retrieval Conference
Abstract: Query language modeling based on relevance feedback has been widely applied to improve the effectiveness of information retrieval. However, intra-query term dependencies (i.e., the dependencies between different query terms and term combinations) have not yet been sufficiently addressed in the existing approaches. This article aims to investigate this issue within a comprehensive framework, namely the Aspect Query Language Model (AM). We propose to extend the AM with a hidden Markov model (HMM) structure to incorporate the intra-query term dependencies and learn the structure of a novel aspect HMM (AHMM) for query language modeling. In the proposed AHMM, the combinations of query terms are viewed as latent variables representing query aspects. They further form an ergodic HMM, where the dependencies between latent variables (nodes) are modeled as the transitional probabilities. The segmented chunks from the feedback documents are considered as observables of the HMM. Then the AHMM structure is optimized by the HMM, which can estimate the prior of the latent variables and the probability distribution of the observed chunks. Our extensive experiments on three large-scale text retrieval conference (TREC) collections have shown that our method not only significantly outperforms a number of strong baselines in terms of both effectiveness and robustness but also achieves better results than the AM and another state-of-the-art approach, namely the latent concept expansion model. © 2014 Wiley Periodicals, Inc. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

18. Synthesizing visual digital library research to formulate a user-centered evaluation framework.

Author: Albertson, Dan
Subjects: DIGITAL libraries, COLLEGE curriculum, ACADEMIC libraries, TEXT Retrieval Conference, INFORMATION processing
Abstract: Purpose -- The purpose of this study is to synthesize prior user-centered research to develop and present a generalized framework for evaluating visual, i.e. both image and video digital libraries. The primary objectives include comprehensively examining the current state of visual digital library research to: develop a generalized framework applicable for designing user-centered evaluations of visual digital libraries; identify influential experimental factors warranting assessment evaluation as part of specific contexts; and provide examples of applied methods that have been used in research, demonstrating notable findings. Design/methodology/approach -- The framework presented in the present study depicts a set of user-centered methodological considerations and examples, synthesized from a review of prior research that provides significant understanding of users and uses of visual information. Findings -- Primary components for digital library evaluation, pertaining to user, interaction, system and domain and topic, and their implications for interactive research are presented. Methods, examples and discussion are presented for each primary evaluation component of the framework. Practical implications -- Previously applied evaluations and their significance are described and presented as part of the developed framework, providing the importance of each component for practical application in future research and development of interactive visual digital libraries. Originality/value -- Visual digital libraries warrant individual assessment, apart from other types of digital collections, as they offer users more ways to retrieve and interact with collection items. The present study complements prior digital library evaluation research by demonstrating the need for a separate framework due to variations influenced by visual information and reporting on evaluations from different perspectives. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

19. The Effect of Using Light Stemming for Arabic Text Classification

Author: Ryan Alturki, Ahmad Hamadeen, Qusay Bsoul, Jaffar Atwan, Mohammad Wedyan, and Mohammed Abdulaziz Ikram
Subjects: General Computer Science, business.industry, Computer science, Arabic, Semitic languages, computer.software_genre, language.human_language, Set (abstract data type), Naive Bayes classifier, Classifier (linguistics), language, Preprocessor, Artificial intelligence, business, Text Retrieval Conference, computer, Natural language processing
Abstract: Arabic is one of the Semitic languages in antiquity and one of the six official languages of the UN. Also, Arabic classification plays a significant and essential role in modern applications. There is a big difference between handling English text and Arabic text classification; preprocessing is also challenging for Arabic text. This paper presents the implementation of a Naive Bayes classifier for Arabic text with and without stemmer. A set of four categories and 800 documents were used from the Text Retrieval Conference (TREC) 2001 dataset. The results showed that Naive Bayes with light stemmer achieves better results than Naive Bayes without stemmer. The findings of the classifier accuracy by employing stemmer and without stemmer are as preprocessing. It reveals that the accuracy resulted from the light stemmer was better than the classifier without stemmer detection, which Naive Bayes Classification with light stemmer got 35.0745 higher than the Naive Bayes Classification 33.831% without stemmer. After contrasting them, the stemmer got better accuracy than the classifier.
Published: 2021
Full Text: View/download PDF

20. Relevance behaviour in TREC.

Author: Ruthven, Ian
Subjects: *RELEVANCE, *RELEVANCE ranking (Information science), *JUDGMENT (Psychology), *DATA mining, TEXT Retrieval Conference
Abstract: Purpose - The purpose of this paper is to examine how various types of TREC data can be used to better understand relevance and serve as test-bed for exploring relevance. The author proposes that there are many interesting studies that can be performed on the TREC data collections that are not directly related to evaluating systems but to learning more about human judgements of information and relevance and that these studies can provide useful research questions for other types of investigation. Design/methodology/approach - Through several case studies the author shows how existing data from TREC can be used to learn more about the factors that may affect relevance judgements and interactive search decisions and answer new research questions for exploring relevance. Findings - The paper uncovers factors, such as familiarity, interest and strictness of relevance criteria, that affect the nature of relevance assessments within TREC, contrasting these against findings from user studies of relevance. Research limitations/implications - The research only considers certain uses of TREC data and assessment given by professional relevance assessors but motivates further exploration of the TREC data so that the research community can further exploit the effort involved in the construction of TREC test collections. Originality/value - The paper presents an original viewpoint on relevance investigations and TREC itself by motivating TREC as a source of inspiration on understanding relevance rather than purely as a source of evaluation material. [ABSTRACT FROM AUTHOR]
Published: 2014
Full Text: View/download PDF

21. Query Expansion Framework Leveraging Clinical Diagnosis Information Ontology

Author: Hesham El-Sayed, Sumbal Malik, Manzoor Ahmed Khan, and Umar Shoaib
Subjects: Information retrieval, Computer science, Unified Medical Language System, 0211 other engineering and technologies, 020101 civil engineering, 02 engineering and technology, Ontology (information science), Clinical decision support system, 0201 civil engineering, Terminology, Search engine, Query expansion, Unified Modeling Language, 021105 building & construction, Text Retrieval Conference, computer, computer.programming_language
Abstract: The explosive growth of biomedical literature has made it difficult for biomedical scientists to locate precise articles and keep them up to date with the latest knowledge. In biomedical literature retrieval, the heterogeneity of medical terminologies and jargons suffer from query mismatch (QM). The query expansion approaches significantly improve query mismatch by incorporating and re-weighting additional similar terms in the original query. The reliance on medical ontologies to alleviate QM has garnered significant attention in biomedical literature retrieval. However, sole reliance on these ontologies is not sufficient to retrieve relevant results. Considering the foregoing statement, in this article, we design and implement a fusion query expansion framework by integrating the combination of clinical diagnosis information (CDI) and medical ontology (MO); to improve the query mismatch problem. In the proposed system, we have explored the top three MOs (MeSH, UMLS, SNOMEDCT) to select candidate expansion terms. The outcomes of the ontologies are then integrated, with clinical diagnosis information predicted by the unstructured knowledge bases to get the best query combination leading to more focused BLR. The experimental results procured on Text REtrieval Conference (TREC) Clinical Decision Support (CDS) dataset show that this fusion QE framework performed significantly better when CDI and MeSH ontology used jointly to retrieve articles. Furthermore, our results demonstrate the notable ability of the proposed framework to help search engines to improve QM in biomedical literature retrieval. We expect our proposed approach would assist investigators to use this query combination to retrieve relevant articles.
Published: 2020
Full Text: View/download PDF

22. Automatic Identification of High Impact Relevant Articles to Support Clinical Decision Making Using Attention-Based Deep Learning

Author: Asim Abbas, Jamil Hussain, Beomjoo Park, Muhammad Afzal, and Sungyoung Lee
Subjects: clinical decision support, Word embedding, Computer Networks and Communications, Computer science, precision medicine, lcsh:TK7800-8360, Clinical decision support system, 03 medical and health sciences, 0302 clinical medicine, health management, health communication, 030212 general & internal medicine, Electrical and Electronic Engineering, Text Retrieval Conference, Health communication, 030304 developmental biology, Transformer (machine learning model), 0303 health sciences, Information retrieval, business.industry, Deep learning, lcsh:Electronics, deep learning, healthcare, machine learning, Ranking, Hardware and Architecture, Control and Systems Engineering, Signal Processing, Artificial intelligence, business
Abstract: To support evidence-based precision medicine and clinical decision-making, we need to identify accurate, appropriate, and clinically relevant studies from voluminous biomedical literature. To address the issue of accurate identification of high impact relevant articles, we propose a novel approach of attention-based deep learning for finding and ranking relevant studies against a topic of interest. For learning the proposed model, we collect data consisting of 240,324 clinical articles from the 2018 Precision Medicine track in Text REtrieval Conference (TREC) to identify and rank relevant documents matched with the user query. We built a BERT (Bidirectional Encoder Representations from Transformers) based classification model to classify high and low impact articles. We contextualized word embedding to create vectors of the documents, and user queries combined with genetic information to find contextual similarity for determining the relevancy score to rank the articles. We compare our proposed model results with existing approaches and obtain a higher accuracy of 95.44% as compared to 94.57% (the next best performer) and get a higher precision by about 14% at P@5 (precision at 5) and about 12% at P@10 (precision at 10). The contextually viable and competitive outcomes of the proposed model confirm the suitability of our proposed model for use in domains like evidence-based precision medicine.
Published: 2020
Full Text: View/download PDF

23. An improved BM25 algorithm for clinical decision support in Precision Medicine based on co-word analysis and Cuckoo Search

Author: Zhang, Zicheng
Subjects: Vocabulary, 020205 medical informatics, Computer science, media_common.quotation_subject, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, Information Storage and Retrieval, Health Informatics, 02 engineering and technology, Ontology (information science), lcsh:Computer applications to medicine. Medical informatics, Clinical decision support system, Semantic network, Query expansion, 0202 electrical engineering, electronic engineering, information engineering, Information retrieval, Humans, Relevance (information retrieval), Precision Medicine, Cuckoo search, Text Retrieval Conference, media_common, Health Policy, Clinical decision support, Decision Support Systems, Clinical, Co-word analysis, Improved BM25, Computer Science Applications, Cuckoo Search, lcsh:R858-859.7, 020201 artificial intelligence & image processing, Algorithm, Algorithms, Research Article
Abstract: Background Retrieving gene and disease information from a vast collection of biomedical abstracts to provide doctors with clinical decision support is one of the important research directions of Precision Medicine. Method We propose a novel article retrieval method based on expanded word and co-word analyses, also conducting Cuckoo Search to optimize parameters of the retrieval function. The main goal is to retrieve the abstracts of biomedical articles that refer to treatments. The methods mentioned in this manuscript adopt the BM25 algorithm to calculate the score of abstracts. We, however, propose an improved version of BM25 that computes the scores of expanded words and co-word leading to a composite retrieval function, which is then optimized using the Cuckoo Search. The proposed method aims to find both disease and gene information in the abstract of the same biomedical article. This is to achieve higher relevance and hence score of articles. Besides, we investigate the influence of different parameters on the retrieval algorithm and summarize how they meet various retrieval needs. Results The data used in this manuscript is sourced from medical articles presented in Text Retrieval Conference (TREC): Clinical Decision Support (CDS) Tracks of 2017, 2018, and 2019 in Precision Medicine. A total of 120 topics are tested. Three indicators are employed for the comparison of utilized methods, which are selected among the ones based only on the BM25 algorithm and its improved version to conduct comparable experiments. The results showed that the proposed algorithm achieves better results. Conclusion The proposed method, an improved version of the BM25 algorithm, utilizes both co-word implementation and Cuckoo Search, which has been verified achieving better results on a large number of experimental sets. Besides, a relatively simple query expansion method is implemented in this manuscript. Future research will focus on ontology and semantic networks to expand the query vocabulary.
Published: 2020

24. Coopetition in IR Research

Author: Ellen M. Voorhees
Subjects: business.industry, Computer science, Coopetition, Competitor analysis, Overfitting, Data science, Field (computer science), Management Information Systems, Task (project management), Test (assessment), Hardware and Architecture, The Internet, business, Text Retrieval Conference
Abstract: Coopetitions are activities in which competitors cooperate for a common good. Community evaluations such as the Text REtrieval Conference (TREC) are prototypical examples of coopetitions in information retrieval (IR) and have now been part of the field for almost thirty years. This longevity and the proliferation of shared evaluation tasks suggest that, indeed, the net impact of community evaluations is positive. But what are these benefits, and what are the attendant costs? This talk will use TREC tracks as case studies to explore the benefits and disadvantages of different evaluation task designs. Coopetitions can improve state-of-the-art effectiveness for a retrieval task by establishing a research cohort and constructing the infrastructure---including problem definition, test collections, scoring metrics, and research methodology---necessary to make progress on the task. They can also facilitate technology transfer and amortize the infrastructure costs. The primary danger of coopetitions is for an entire research community to overfit to some peculiarity of the evaluation task. This risk can be minimized by building multiple test sets and regularly updating the evaluation task.
Published: 2020
Full Text: View/download PDF

25. Balancing Reinforcement Learning Training Experiences in Interactive Information Retrieval

Author: Limin Chen, Zhiwen Tang, and Grace Hui Yang
Subjects: FOS: Computer and information sciences, Text corpus, Information retrieval, Computer Science - Artificial Intelligence, Computer science, 05 social sciences, Training (meteorology), Sample (statistics), 02 engineering and technology, 050905 science studies, Computer Science - Information Retrieval, Domain (software engineering), Artificial Intelligence (cs.AI), 0202 electrical engineering, electronic engineering, information engineering, Reinforcement learning, 020201 artificial intelligence & image processing, Relevance (information retrieval), 0509 other social sciences, Text Retrieval Conference, Information Retrieval (cs.IR)
Abstract: Interactive Information Retrieval (IIR) and Reinforcement Learning (RL) share many commonalities, including an agent who learns while interacts, a long-term and complex goal, and an algorithm that explores and adapts. To successfully apply RL methods to IIR, one challenge is to obtain sufficient relevance labels to train the RL agents, which are infamously known as sample inefficient. However, in a text corpus annotated for a given query, it is not the relevant documents but the irrelevant documents that predominate. This would cause very unbalanced training experiences for the agent and prevent it from learning any policy that is effective. Our paper addresses this issue by using domain randomization to synthesize more relevant documents for the training. Our experimental results on the Text REtrieval Conference (TREC) Dynamic Domain (DD) 2017 Track show that the proposed method is able to boost an RL agent's learning effectiveness by 22\% in dealing with unseen situations., Comment: Accepted by SIGIR 2020
Published: 2020
Full Text: View/download PDF

26. Improving Sentence Retrieval Using Sequence Similarity

Author: Sven Gotovac, Alen Doko, and Ivan Boban
Subjects: Matching (statistics), language modeling, TF−ISF, Computer science, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, 02 engineering and technology, computer.software_genre, lcsh:Technology, Novelty detection, lcsh:Chemistry, 020204 information systems, sentence retrieval, BM25, partial match, sequence similarity, 0202 electrical engineering, electronic engineering, information engineering, Question answering, General Materials Science, Document retrieval, lcsh:QH301-705.5, Instrumentation, Text Retrieval Conference, Fluid Flow and Transfer Processes, lcsh:T, business.industry, Process Chemistry and Technology, General Engineering, Novelty, lcsh:QC1-999, Computer Science Applications, lcsh:Biology (General), lcsh:QD1-999, lcsh:TA1-2040, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, Intelligente und verteilte Systeme, 020201 artificial intelligence & image processing, Language model, Artificial intelligence, Institut für Softwaretechnologie, lcsh:Engineering (General). Civil engineering (General), business, computer, lcsh:Physics, Natural language processing, Sentence
Abstract: Sentence retrieval is an information retrieval technique that aims to find sentences corresponding to an information need. It is used for tasks like question answering (QA) or novelty detection. Since it is similar to document retrieval but with a smaller unit of retrieval, methods for document retrieval are also used for sentence retrieval like term frequency&mdash, inverse document frequency (TF-IDF), BM25, and language modeling-based methods. The effect of partial matching of words to sentence retrieval is an issue that has not been analyzed. We think that there is a substantial potential for the improvement of sentence retrieval methods if we consider this approach. We adapted TF-ISF, BM25, and language modeling-based methods to test the partial matching of terms through combining sentence retrieval with sequence similarity, which allows matching of words that are similar but not identical. All tests were conducted using data from the novelty tracks of the Text Retrieval Conference (TREC). The scope of this paper was to find out if such approach is generally beneficial to sentence retrieval. However, we did not examine in depth how partial matching helps or hinders the finding of relevant sentences.
Published: 2020
Full Text: View/download PDF

27. Evaluating Multimedia and Language Tasks

Author: Asad A. Butt, Ian Soboroff, Keith Curtis, and George M. Awad
Subjects: multimedia, evaluation, Multimedia, Multimedia search, Computer science, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, information retrieval (IR), Information access, computer.software_genre, TRECVID, lcsh:QA75.5-76.95, metrics, Annotation, annotation, Artificial Intelligence, Perspective, Question answering, NIST, lcsh:Electronic computers. Computer science, Text Retrieval Conference, computer
Abstract: Evaluating information access tasks, including textual and multimedia search, question answering, and understanding has been the core mission of NIST's Retrieval Group since 1989. The TRECVID Evaluations of Multimedia Access began in 2001 with a goal of driving content-based search technology for multimedia just as its progenitor, the Text Retrieval Conference (TREC) did for text and web1.
Published: 2020
Full Text: View/download PDF

28. What Makes a Top-Performing Precision Medicine Search Engine? Tracing Main System Features in a Systematic Way

Author: Udo Hahn, Erik Faessler, and Michel Oleynik
Subjects: FOS: Computer and information sciences, 021103 operations research, Information retrieval, Stop words, Boosting (machine learning), J.3, Computer science, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, 0211 other engineering and technologies, 02 engineering and technology, 68P20 (Primary), 92C50, 92Dxx (Secondary), Weighting, Computer Science - Information Retrieval, H.3.3, H.3.1, Search engine, Query expansion, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Feature (machine learning), Metric (unit), Text Retrieval Conference, Information Retrieval (cs.IR)
Abstract: From 2017 to 2019 the Text REtrieval Conference (TREC) held a challenge task on precision medicine using documents from medical publications (PubMed) and clinical trials. Despite lots of performance measurements carried out in these evaluation campaigns, the scientific community is still pretty unsure about the impact individual system features and their weights have on the overall system performance. In order to overcome this explanatory gap, we first determined optimal feature configurations using the Sequential Model-based Algorithm Configuration (SMAC) program and applied its output to a BM25-based search engine. We then ran an ablation study to systematically assess the individual contributions of relevant system features: BM25 parameters, query type and weighting schema, query expansion, stop word filtering, and keyword boosting. For evaluation, we employed the gold standard data from the three TREC-PM installments to evaluate the effectiveness of different features using the commonly shared infNDCG metric., Comment: Accepted for SIGIR2020, 10 pages
Published: 2020
Full Text: View/download PDF

29. Multi-Stage Conversational Passage Retrieval: An Approach to Fusing Term Importance Estimation and Neural Query Rewriting

Author: Sheng-Chieh Lin, Chuan-Ju Wang, Jimmy Lin, Ming-Feng Tsai, Rodrigo Nogueira, and Jheng-Hong Yang
Subjects: FOS: Computer and information sciences, Computer Science - Artificial Intelligence, Computer science, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, Context (language use), 02 engineering and technology, computer.software_genre, Computer Science - Information Retrieval, 020204 information systems, Component (UML), 0202 electrical engineering, electronic engineering, information engineering, Text Retrieval Conference, Coreference, Computer Science - Computation and Language, business.industry, Information seeking, 05 social sciences, Rank (computer programming), General Business, Management and Accounting, Computer Science Applications, Term (time), Artificial Intelligence (cs.AI), Artificial intelligence, 0509 other social sciences, 050904 information & library sciences, business, computer, Computation and Language (cs.CL), Natural language processing, Natural language, Information Retrieval (cs.IR), Information Systems
Abstract: Conversational search plays a vital role in conversational information seeking. As queries in information seeking dialogues are ambiguous for traditional ad-hoc information retrieval (IR) systems due to the coreference and omission resolution problems inherent in natural language dialogue, resolving these ambiguities is crucial. In this paper, we tackle conversational passage retrieval (ConvPR), an important component of conversational search, by addressing query ambiguities with query reformulation integrated into a multi-stage ad-hoc IR system. Specifically, we propose two conversational query reformulation (CQR) methods: (1) term importance estimation and (2) neural query rewriting. For the former, we expand conversational queries using important terms extracted from the conversational context with frequency-based signals. For the latter, we reformulate conversational queries into natural, standalone, human-understandable queries with a pretrained sequence-tosequence model. Detailed analyses of the two CQR methods are provided quantitatively and qualitatively, explaining their advantages, disadvantages, and distinct behaviors. Moreover, to leverage the strengths of both CQR methods, we propose combining their output with reciprocal rank fusion, yielding state-of-the-art retrieval effectiveness, 30% improvement in terms of NDCG@3 compared to the best submission of TREC CAsT 2019., Comment: 28 pages. Accepted to ACM Transactions on Information Systems, Special Issue on Conversational Search and Recommendation. The first two authors contributed equally. Code: https://github.com/castorini/chatty-goose
Published: 2020
Full Text: View/download PDF

30. Bees swarm optimization guided by data mining techniques for document information retrieval

Author: Youcef Djenouri, Asma Belhadi, and Riadh Belkebir
Subjects: Information retrieval, business.industry, Computer science, Big data, General Engineering, Swarm behaviour, 02 engineering and technology, Space (commercial competition), computer.software_genre, Field (computer science), Computer Science Applications, Artificial Intelligence, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Data mining, business, computer, Text Retrieval Conference
Abstract: This paper explores advances in the data mining field to solve the fundamental Document Information Retrieval problem. In the proposed approach, useful knowledge is first discovered by using data mining techniques, then swarms use this knowledge to explore the whole space of documents intelligently. We have investigated two data mining techniques in the preprocessing step. The first one aims to split the collection of documents into similar clusters by using the K-means algorithm, while the second one extracts the most closed frequent terms on each cluster already created using the DCI_Closed algorithm. For the solving step, BSO (Bees Swarm Optimization) is used to explore the cluster of documents deeply. The proposed approach has been evaluated on well-known collections such as CACM (Collection of ACM), TREC (Text REtrieval Conference), Webdocs, and Wikilinks, and it has been compared to state-of-the-art data mining, bio-inspired and other documents information retrieval based approaches. The results show that the proposed approach improves the quality of returned documents considerably, with a competitive computational time compared to state-of-the-art approaches.
Published: 2018
Full Text: View/download PDF

31. Identifying top news using crowdsourcing.

Author: McCreadie, Richard, Macdonald, Craig, and Ounis, Iadh
Subjects: *CROWDSOURCING, *BLOGS, *INFORMATION retrieval, *DOCUMENTATION, TEXT Retrieval Conference
Abstract: The influential Text REtrieval Conference (TREC) retrieval conference has always relied upon specialist assessors or occasionally participating groups to create relevance judgements for the tracks that it runs. Recently however, crowdsourcing has been championed as a cheap, fast and effective alternative to traditional TREC-like assessments. In 2010, TREC tracks experimented with crowdsourcing for the very first time. In this paper, we report our successful experience in creating relevance assessments for the TREC Blog track 2010 top news stories task using crowdsourcing. In particular, we crowdsourced both real-time newsworthiness assessments for news stories as well as traditional relevance assessments for blog posts. We conclude that crowdsourcing not only appears to be a feasible, but also cheap and fast means to generate relevance assessments. Furthermore, we detail our experiences running the crowdsourced evaluation of the TREC Blog track, discuss the lessons learned, and provide best practices. [ABSTRACT FROM AUTHOR]
Published: 2013
Full Text: View/download PDF

32. 5.5 Research on System and Process Design.

Subjects: MANAGEMENT science, ELECTRONIC information resource searching, TEXT Retrieval Conference, TECHNOLOGY, ELECTRONIC discovery (Law)
Abstract: The article focuses on the research on system and process design. It states about the Text Retrieval Conference (TREC) LEgal Track in which both concept search and technology-assisted review were introduced into the electronic-discovery marketplace. It further discusses the review of the evaluation result for technology assisted review, manual review, and keyword search
Published: 2013

33. Textual resource acquisition and engineering.

Author: Chu-Carroll, J., Fan, J., Schlaefer, N., and Zadrozny, W.
Subjects: *QUESTION answering systems, *WATSON (Computer), *TRANSMISSION of texts, *ITERATIVE methods (Mathematics), *ENCYCLOPEDIAS & dictionaries, *WEB databases, TEXT Retrieval Conference
Abstract: A key requirement for high-performing question-answering (QA) systems is access to high-quality reference corpora from which answers to questions can be hypothesized and evaluated. However, the topic of source acquisition and engineering has received very little attention so far. This is because most existing systems were developed under organized evaluation efforts that included reference corpora as part of the task specification. The task of answering Jeopardy!™ questions, on the other hand, does not come with such a well-circumscribed set of relevant resources. Therefore, it became part of the IBM Watson™ effort to develop a set of well-defined procedures to acquire high-quality resources that can effectively support a high-performing QA system. To this end, we developed three procedures, i.e., source acquisition, source transformation, and source expansion. Source acquisition is an iterative development process of acquiring new collections to cover salient topics deemed to be gaps in existing resources based on principled error analysis. Source transformation refers to the process in which information is extracted from existing sources, either as a whole or in part, and is represented in a form that the system can most easily use. Finally, source expansion attempts to increase the coverage in the content of each known topic by adding new information as well as lexical and syntactic variations of existing information extracted from external large collections. In this paper, we discuss the methodology that we developed for IBM Watson for performing acquisition, transformation, and expansion of textual resources. We demonstrate the effectiveness of each technique through its impact on candidate recall and on end-to-end QA performance. [ABSTRACT FROM AUTHOR]
Published: 2012
Full Text: View/download PDF

34. Chapter 4: Evaluation.

Author: Balog, Krisztian, Yi Fang, De Rijke, Maarten, Serdyukov, Pavel, and Luo Si
Subjects: ACQUISITION of data, INFORMATION retrieval, EVALUATION methodology, TEXT Retrieval Conference, QUERY languages (Computer science), CHARTS, diagrams, etc.
Abstract: The article provides information on several test collections and evaluation methodology that have originated from Text Retrieval Conference (TREC) along with other collections developed by researchers for expertise retrieval task. It discusses evaluation of expert finding methods and expert profiling task. It focuses on test queries. A table is also presented which presents an overview of test collections.
Published: 2012
Full Text: View/download PDF

35. Chapter 2: Background.

Author: Balog, Krisztian, Yi Fang, De Rijke, Maarten, Serdyukov, Pavel, and Luo Si
Subjects: EXPERTISE, INFORMATION retrieval, SPECIALISTS, TEXT Retrieval Conference, EXPERT systems
Abstract: The article presents information on expertise retrieval and its related tasks which include expert finding in social networks, resource selection and entity retrieval. It discusses expertise retrieval system in information retrieval. It also focuses on the Text Retrieval Conference (TREC) enterprise track.
Published: 2012
Full Text: View/download PDF

36. Inconsistent Responsiveness Determination in Document Review: Difference of Opinion or Human Error?

Author: Grossman, Maura R. and Cormack, Gordon V.
Subjects: *LEGAL documents, *AMBIGUITY, *CIVIL procedure, *CONFIDENTIAL communications, *HUMAN error, TEXT Retrieval Conference
Abstract: This Article analyzes the inconsistency between different document review efforts on the same document collection to determine whether that inconsistency is due primarily to ambiguity in applying the definition of responsiveness to particular documents, or due primarily to human error. By examining documents from the TREC 2009 Legal Track, the Authors show that inconsistent assessments regarding the same documents are due in large part to human error. Therefore, the quality of a review effort is not simply a matter of opinion; it is possible to show objectively that some reviews, and some review methods, are better than others. [ABSTRACT FROM AUTHOR]
Published: 2012
Full Text: View/download PDF

37. Chapter 7: Future Work in Blog and Microblog Search.

Author: Santos, Rodrygo L. T., Macdonald, Craig, McCreadie, Richard, Ounis, Iadh, and Soboroff, Ian
Subjects: SEARCH engines, BLOGS, MICROBLOGS, TEXT Retrieval Conference
Abstract: Chapter 7 of the book "Foundations and Trends in Information Retrieval: Information Retrieval on the Blogosphere" is presented. It explores the future aspects in the field of blog and microblog search. It highlights Text REtrieval Conference 2009/2010 Blog as the most popular blog for academic or industrial research and Twitter is the most extensively used for microblogging research.
Published: 2012
Full Text: View/download PDF

38. Chapter 6: Publicly Available Resources.

Author: Santos, Rodrygo L. T., Macdonald, Craig, McCreadie, Richard, Ounis, Iadh, and Soboroff, Ian
Subjects: SEARCH engines, BLOGS, TEXT Retrieval Conference, PUBLICATIONS, INFORMATION retrieval
Abstract: Chapter 6 of the book "Foundations and Trends in Information Retrieval: Information Retrieval on the Blogosphere" is presented. It explores the resources developed to enhance the research within a blogosphere context. It discusses Text REtrieval Conference (TREC), Workshop on the Weblogging Ecosystem (WWE) and International Conference on Weblogging and Social Media (ICWSM).
Published: 2012
Full Text: View/download PDF

39. E-Discovery revisited: the need for artificial intelligence beyond information retrieval.

Author: Conrad, Jack G.
Subjects: ARTIFICIAL intelligence, INFORMATION retrieval, ELECTRONIC discovery (Law), SOCIAL networks, DATA mining
Abstract: In this work, we provide a broad overview of the distinct stages of E-Discovery. We portray them as an interconnected, often complex workflow process, while relating them to the general Electronic Discovery Reference Model (EDRM). We start with the definition of E-Discovery. We then describe the very positive role that NIST's Text REtrieval Conference (TREC) has added to the science of E-Discovery, in terms of the tasks involved and the evaluation of the legal discovery work performed. Given the critical nature that data analysis plays at various stages of the process, we present a pyramid model, which complements the EDRM model: for gathering and hosting; indexing; searching and navigating; and finally consolidating and summarizing E-Discovery findings. Next we discuss where the current areas of need and areas of growth appear to be, using one of the field's most authoritative surveys of providers and consumers of E-Discovery products and services. We subsequently address some areas of Artificial Intelligence, both Information Retrieval-related and not, which promise to make future contributions to the E-Discovery discipline. Some of these areas include data mining applied to e-mail and social networks, classification and machine learning, and the technologies that will enable next generation E-Discovery. The lesson we convey is that the more IR researchers and others understand the broader context of E-Discovery, including the stages that occur before and after primary search, the greater will be the prospects for broader solutions, creative optimizations and synergies yet to be tapped. [ABSTRACT FROM AUTHOR]
Published: 2010
Full Text: View/download PDF

40. Query expansion based on term distribution and DBpedia features

Author: Hamid Bennis, Sarah Dahir, and Abderrahim El Qadi
Subjects: 0209 industrial biotechnology, Vocabulary, Information retrieval, Computer science, media_common.quotation_subject, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, Rank (computer programming), General Engineering, Relevance feedback, 02 engineering and technology, Linked data, Computer Science Applications, Query expansion, 020901 industrial engineering & automation, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, Vocabulary mismatch, 020201 artificial intelligence & image processing, Learning to rank, Text Retrieval Conference, media_common
Abstract: Query Expansion (QE) approaches that involve the reformulation of queries by adding new terms to the initial user query, are intended to ameliorate the vocabulary mismatch between the query keywords and the documents’ in Information Retrieval Systems (IRS). One big issue in QE is the selection of the right candidate terms for expansion. For this purpose Linked Data can be used, as a valuable resource, for providing additional expansion features such as the values of sub- and super classes of resources. The underlying research question is whether interlinked data and vocabulary items provide features which can be taken into account for query expansion. In this paper, we introduced a new QE approach that aimed at improving IRS by using the well-known distribution based method Bose-Einstein statistics (Bo1) as well as Linked Data from the knowledge base DBpedia using different numbers of expansion terms. We evaluated the effectiveness of each method individually as well as their combinations using two Text REtrieval Conference (TREC) test collections. Our approach has lead to significant improvement in terms of precision, recall, Mean Average Precision (MAP) at rank 10, and normalized Discounted Cumulative Gain (nDCG) at different ranks compared to Pseudo Relevance Feedback (PRF) that we used as a baseline. The results show that the inclusion of semantic annotations clearly improves the retrieval performance over the baseline method.
Published: 2021
Full Text: View/download PDF

41. A joint deep model of entities and documents for cumulative citation recommendation

Author: Dandan Song, Lerong Ma, Lejian Liao, and Yao Ni
Subjects: Information retrieval, Computer Networks and Communications, Process (engineering), Computer science, Feature extraction, Empirical process (process control model), 020206 networking & telecommunications, 02 engineering and technology, Random forest, Set (abstract data type), Support vector machine, 0202 electrical engineering, electronic engineering, information engineering, Leverage (statistics), 020201 artificial intelligence & image processing, Text Retrieval Conference, Software
Abstract: Knowledge bases (such as Wikipedia) are valuable resources of human knowledge which have contributed to various of applications. However, their manual maintenance makes a big lag between their contents and the up-to-date information of entities. Cumulative citation recommendation (CCR) concentrates on identifying worthy-citation documents from a large volume of stream data for a given target entity in knowledge bases. Most previous approaches first carefully extract human-designed features from entities and documents, and then leverage machine learning methods such as SVM and Random Forests to filter worthy-citation documents for target entities. There are some problems in handcraft features for entities and documents: (1) It is an empirical process that requires expert knowledge, thus cannot be easily generalized; (2) The effectiveness of humanly designed features has great effect on the performance; (3) The implementation of the feature extraction process is resource dependent and time-consuming. In this paper, we present a Joint Deep Neural Network Model of Entities and Documents for CCR, termed as DeepJoED, to identify highly related documents for given entities with several layers of neurons, by automatically learn feature extraction of the entities and documents, and train the networks in an end-to-end fashion.An extensive set of experiments have been conducted on the benchmark dataset provided in the Text REtrieval Conference (TREC) Knowledge base acceleration (KBA) task in 2012. The results show the model can bring a significant improvement relative to the state-of-the-art results on this dataset in CCR.
Published: 2017
Full Text: View/download PDF

42. A Composite Natural Language Processing and Information Retrieval Approach to Question Answering Using a Structured Knowledge Base

Author: Ajay Bansal and Avani Chandurkar
Subjects: Linguistics and Language, Computer Networks and Communications, Process (engineering), Computer science, 0102 computer and information sciences, 02 engineering and technology, computer.software_genre, 01 natural sciences, Task (project management), World Wide Web, Search engine, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, Question answering, Text Retrieval Conference, Focus (computing), Information retrieval, business.industry, Computer Science Applications, Knowledge base, 010201 computation theory & mathematics, 020201 artificial intelligence & image processing, The Internet, Artificial intelligence, business, computer, Software, Natural language processing, Information Systems
Abstract: With the inception of the World Wide Web, the amount of data present on the Internet is tremendous. This makes the task of navigating through this enormous amount of data quite difficult for the user. As users struggle to navigate through this wealth of information, the need for the development of an automated system that can extract the required information becomes urgent. This paper presents a Question Answering system to ease the process of information retrieval. Question Answering systems have been around for quite some time and are a sub-field of information retrieval and natural language processing. The task of any Question Answering system is to seek an answer to a free form factual question. The difficulty of pinpointing and verifying the precise answer makes question answering more challenging than simple information retrieval done by search engines. The research objective of this paper is to develop a novel approach to Question Answering based on a composition of conventional approaches of Information Retrieval (IR) and Natural Language processing (NLP). The focus is on using a structured and annotated knowledge base instead of an unstructured one. The knowledge base used here is DBpedia and the final system is evaluated on the Text REtrieval Conference (TREC) 2004 questions dataset.
Published: 2017
Full Text: View/download PDF

43. Evaluating the impact of MeSH (Medical Subject Headings) terms on different types of searchers

Author: Nina Wacholder and Ying-Hsang Liu
Subjects: Information retrieval, Recall, Computer science, 05 social sciences, Information needs, Subject (documents), 02 engineering and technology, Library and Information Sciences, Management Science and Operations Research, Computer Science Applications, Domain (software engineering), Controlled vocabulary, 0202 electrical engineering, electronic engineering, information engineering, Media Technology, Domain knowledge, 020201 artificial intelligence & image processing, Relevance (information retrieval), 0509 other social sciences, 050904 information & library sciences, Text Retrieval Conference, Information Systems
Abstract: The usefulness of controlled vocabularies, exemplified by Medical Subject Headings (MeSH), was evaluated by a controlled user experiment.MeSH terms were most useful for domain experts in terms of precision measure.Domain knowledge was correlated with the precision score, whereas search training correlated with the recall score.This study demonstrated the feasibility of re-using a test collection originally created for evaluating the effectiveness of retrieval techniques for a controlled user experiment. To what extent do MeSH terms improve search effectiveness for different kinds of users? We observed four different kinds of information seekers using an experimental information retrieval system: (1) search novices; (2) domain experts; (3) search experts and (4) medical librarians. Participants searched using either a version of the system in which MeSH terms were displayed or another version in which they had to formulate their own terms. The information needs were a subset of the relatively difficult topics originally created for the Text REtrieval Conference (TREC). Effectiveness of retrieval was based on the relevance judgments provided by TREC. The results of the study provide experimental evidence of the usefulness of MeSH terms and further identify the significant relationship between the user characteristics of domain knowledge and search training and the search performance in an interactive search environment.
Published: 2017
Full Text: View/download PDF

44. An Efficient Corpus-Based Stemmer

Author: Vishal Gupta and Jasmeet Singh
Subjects: Vocabulary, Information retrieval, Computer science, business.industry, Cognitive Neuroscience, media_common.quotation_subject, Full text search, 02 engineering and technology, computer.software_genre, Computer Science Applications, 020204 information systems, Inflection, 0202 electrical engineering, electronic engineering, information engineering, Vocabulary mismatch, 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Stemming, Visual Word, Artificial intelligence, business, Text Retrieval Conference, computer, Word (computer architecture), Natural language processing, media_common
Abstract: Word stemming is a linguistic process in which the various inflected word forms are matched to their base form. It is among the basic text pre-processing approaches used in Natural Language Processing and Information Retrieval. Stemming is employed at the text pre-processing stage to solve the issue of vocabulary mismatch or to reduce the size of the word vocabulary, and consequently also the dimensionality of training data for statistical models. In this article, we present a fully unsupervised corpus-based text stemming method which clusters morphologically related words based on lexical knowledge. The proposed method performs cognitive-inspired computing to discover morphologically related words from the corpus without any human intervention or language-specific knowledge. The performance of the proposed method is evaluated in inflection removal (approximating lemmas) and Information Retrieval tasks. The retrieval experiments in four different languages using standard Text Retrieval Conference, Cross-Language Evaluation Forum, and Forum for Information Retrieval Evaluation collections show that the proposed stemming method performs significantly better than no stemming. In the case of highly inflectional languages, Marathi and Hungarian, the improvement in Mean Average Precision is nearly 50% as compared to unstemmed words. Moreover, the proposed unsupervised stemming method outperforms state-of-the-art strong language-independent and rule-based stemming methods in all the languages. Besides Information Retrieval, the proposed stemming method also performs significantly better in inflection removal experiments. The proposed unsupervised language-independent stemming method can be used as a multipurpose tool for various tasks such as the approximation of lemmas, improving retrieval performance or other Natural Language Processing applications.
Published: 2017
Full Text: View/download PDF

45. A framework for designing retrieval effectiveness studies of library information systems using human relevance assessments

Author: Dirk Lewandowski and Christiane Behnert
Subjects: Cognitive models of information retrieval, Information retrieval, Computer science, 05 social sciences, Comparability, 050801 communication & media studies, Context (language use), Library and Information Sciences, Digital library, 0508 media and communications, Human–computer information retrieval, Information system, Relevance (information retrieval), 0509 other social sciences, 050904 information & library sciences, Text Retrieval Conference, Information Systems
Abstract: PurposeThe purpose of this paper is to demonstrate how to apply traditional information retrieval (IR) evaluation methods based on standards from the Text REtrieval Conference and web search evaluation to all types of modern library information systems (LISs) including online public access catalogues, discovery systems, and digital libraries that provide web search features to gather information from heterogeneous sources.Design/methodology/approachThe authors apply conventional procedures from IR evaluation to the LIS context considering the specific characteristics of modern library materials.FindingsThe authors introduce a framework consisting of five parts: search queries, search results, assessors, testing, and data analysis. The authors show how to deal with comparability problems resulting from diverse document types, e.g., electronic articles vs printed monographs and what issues need to be considered for retrieval tests in the library context.Practical implicationsThe framework can be used as a guideline for conducting retrieval effectiveness studies in the library context.Originality/valueAlthough a considerable amount of research has been done on IR evaluation, and standards for conducting retrieval effectiveness studies do exist, to the authors’ knowledge this is the first attempt to provide a systematic framework for evaluating the retrieval effectiveness of twenty-first-century LISs. The authors demonstrate which issues must be considered and what decisions must be made by researchers prior to a retrieval test.
Published: 2017
Full Text: View/download PDF

46. Evaluation of biomedical text-mining systems: Lessons learned from information retrieval.

Author: Hersh, William
Subjects: *BIOINFORMATICS, *COMPUTERS in biology, *INFORMATION science, *MEDICINE, *INFORMATION retrieval
Abstract: Biomedical text-mining systems have great promise for improving the efficiency and productivity of biomedical researchers. However, such systems are still not in routine use. One impediment to their development is the lack of systematic and rigorous evaluation, comparable to the approaches developed for information retrieval systems. The developers of text-mining systems need to improve both test collections for system-oriented evaluation and undertake user-oriented evaluations to determine the most effective use of their systems for their intended audience. [ABSTRACT FROM AUTHOR]
Published: 2005
Full Text: View/download PDF

47. A graph-based approach for text query expansion using pseudo relevance feedback and association rules mining

Author: Taoufiq Gadi, Azzeddine Dahbi, and Siham Jabri
Subjects: Query expansion, Information retrieval, General Computer Science, Association rule learning, Computer science, Graph based, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, Relevance feedback, Dominance relations, Association rules, Set (abstract data type), Ranking, Graph (abstract data type), Electrical and Electronic Engineering, Text query, TREC, Text Retrieval Conference, Pseudo-graph feedback
Abstract: Pseudo-relevance feedback is a query expansion approach whose terms are selected from a set of top ranked retrieved documents in response to the original query. However, the selected terms will not be related to the query if the top retrieved documents are irrelevant. As a result, retrieval performance for the expanded query is not improved, compared to the original one. This paper suggests the use of documents selected using Pseudo Relevance Feedback for generating association rules. Thus, an algorithm based on dominance relations is applied. Then the strong correlations between query and other terms are detected, and an oriented and weighted graph called Pseudo-Graph Feedback is constructed. This graph serves for expanding original queries by terms related semantically and selected by the user. The results of the experiments on Text Retrieval Conference (TREC) collection are very significant, and best results are achieved by the proposed approach compared to both the baseline system and an existing technique.
Published: 2019

48. Contextualized Relevance Feedback for Precision Medicine

Author: Le Wang and Ze Luo
Subjects: Matching (statistics), Information retrieval, Computer science, Benchmark (computing), Relevance feedback, Context (language use), Representation (mathematics), Text Retrieval Conference, Word (computer architecture), Ranking (information retrieval)
Abstract: Precision Medicine (PM) is viewed as an information retrieval (IR) task, in which biomedical articles containing treatment information about specific diseases or genetic variants are retrieved in response to patient record, aiming at providing medical evidence to the point-of-care. Previous PM approaches are mostly based on unigram matching of individual query terms, or concepts, to the target articles to produce the ranking list, while ignoring the context of the matched query terms of concepts. To this end, this paper presents a preliminary investigation of utilizing contextualized representation of text for pseudo relevance feedback (PRF) to enhance PM search effectiveness. By considering the multi-aspect word relations, we propose a $BERT_{NPRF}$ model to integrate PRF with the fine-tuned BERT model for contextualized interaction of document-document pairs. Experimental results on the standard Text REtrieval Conference (TREC) PM track benchmark show that our proposed method with interpolation can improve the performance in PM.
Published: 2019
Full Text: View/download PDF

49. Query Reconstruction in Medical Case Description Using Query Performance Predictors

Author: Yao Yao, Yu Fang, and Mingming Lu
Subjects: 0303 health sciences, Information retrieval, 010504 meteorology & atmospheric sciences, Computer science, media_common.quotation_subject, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, Process (computing), Electronic medical record, Case description, 01 natural sciences, Clinical decision support system, Medical literature retrieval, 03 medical and health sciences, Quality (business), Text Retrieval Conference, Clinical record, 030304 developmental biology, 0105 earth and related environmental sciences, media_common
Abstract: As an important branch of clinical decision support track, medical literature retrieval receives the electronic medical record as an input to obtain the acquired information from a large amount of medical literature. However, the description of clinical record is rather complicated and ambiguous in semantic way, therefore it is necessary for us to perform query reconstruction for a better retrieval performance. In this paper, we take the advantage of query quality predictors to process long clinical notes with redundant content, and predict query intent to reconstruct the original query. Experimental results on the standard Text REtrieval Conference (TREC) CDS track dataset confirm the superior performance of the proposed method.
Published: 2019
Full Text: View/download PDF

50. Self-Attention based Network For Medical Query Expansion

Author: Yun He, Huaying Wu, Liang He, Su Chen, Yang Song, and Qinmin Vivian Hu
Subjects: Information retrieval, Computer science, Text annotation, Information needs, 02 engineering and technology, Clinical decision support system, Convolutional neural network, Term (time), Query expansion, 020204 information systems, Similarity (psychology), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Text Retrieval Conference, Sentence
Abstract: The aim of clinical decision support implementing electronic health records is to satisfy the physicians’ information needs. We are motivated to propose a self-attention based network on query expansion. Considering the difficulty and cost of medical text annotation and inspired by the idea of migration learning, we choose the Semantic Textual Similarity dataset for model training. Different from the previous work, the proposed approach is not only considering the score of a single term as an expansion term, but also taking the score of term combination into account. Our model utilizes Convolutional Neural Networks (CNN) to obtain sentence representation and self-attention mechanism for entity representation. With self-attention, it is able to estimate the weight of each entity to learn better representation for all entities. We conduct the experiments on three standard datasets of Text REtrieval Conference Clinical Decision Support Track, where the approach has a promising overall performance over the strong baselines.
Published: 2019
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

278 results on '"Text Retrieval Conference"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources