41 results on '"Fuhr, Norbert"'
Search Results
2. XML Documents Clustering by Structures.
- Author
-
Fuhr, Norbert, Lalmas, Mounia, Malik, Saadia, Kazai, Gabriella, Nayak, Richi, and Xu, Sumei
- Abstract
XCLS is a novel clustering algorithm to assemble heterogeneous XML documents by measuring their level similarity with a global criterion function. XCLS does not require the pair wise similarity to be computed between two individual documents, rather it measures the similarity at clustering level utilising the structural information of XML documents. Quality of the clustering solution depends on the calculation of the level similarity, and whether the level similarity can represent the documents' structural similarity correctly. In this paper, we present the performance of XCLS for clustering the structural descriptions (ordered labeled trees) of XML documents. We have reported 5 sub-tasks corresponding to 5 corpuses as provided by the INEX 2005 document mining track. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
3. Implementation of a High-Speed and High-Precision XML Information Retrieval System on Relational Databases.
- Author
-
Fuhr, Norbert, Lalmas, Mounia, Malik, Saadia, Kazai, Gabriella, Fujimoto, Kei, Shimizu, Toshiyuki, Terada, Norimasa, Hatano, Kenji, Suzuki, Yu, Amagasa, Toshiyuki, Kinutani, Hiroko, and Yoshikawa, Masatoshi
- Abstract
This paper describes an XML information retrieval system that we have developed. It is based on a vector space model, and implemented on top of XRel, a relational XML database system that has been developed in our research group. When a query is processed, a large number of fragments are retrieved, because a single XML document usually contains many XML fragments. Keeping all XML fragments degrades retrieval precision and increases query processing time, because some XML fragments are not appropriate as a query target. In existing methods, retrieval targets are manually selected by human experts when an XML collection is stored in the system. Such manual selection is not feasible when many kinds of XML documents are stored in the system. To cope with the problem we propose a method for automatically selecting document-centric fragments by introducing three measurements, namely, period ratio, number of different words, and empirical rules. By deleting inappropriate data-centric fragments from results of keyword query, we can improve the accuracy and performance of our system. Through performance evaluations, we confirmed the improvement of retrieval precision and query processing speed. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
4. Users Interaction with the Hierarchically Structured Presentation in XML Document Retrieval.
- Author
-
Fuhr, Norbert, Lalmas, Mounia, Malik, Saadia, Kazai, Gabriella, Kim, Heesop, and Son, Heejung
- Abstract
Some changes were made in the interface design of this year's XML documents retrieval system according to the outcomes of the Interactive track in INEX 2004. One of the major changes was the hierarchical structure of the presentation in the search results. The main purpose of our study was to investigate how the hierarchical presentation of interface influences the searchers' behavior in XML document retrieval. To achieve this objective we analyzed the transaction logs from this year's experiment and compared the results to those of last year's experiment. The subjects' comments on the experiment and the system were also examined. The Daffodil XML retrieval system was used and 12 test persons participated in the experiment. SPSS for Windows 12.0 was used for statistical analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
5. Processing Heterogeneous Collections in XML Information Retrieval.
- Author
-
Fuhr, Norbert, Lalmas, Mounia, Malik, Saadia, Kazai, Gabriella, Azevedo, Maria Izabel Menezes, Paixão, Klérisson Vinícius Ribeiro, and Pereira, Diego Vinícius Castro
- Abstract
Our model is based on the observation that the tags used in XML documents are semantically related to the content that they delimit. To evaluate the performance of our approach, we participated in the INEX 2004 heterogeneous track, along with 34 other institutions, from which only 5 groups, including us, submitted runs. In this paper we describe how the approach we used in INEX 2004 and 2005 processes heterogeneous collections without any mapping of DTDs. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
6. Multimedia Strategies for B3-SDR, Based on Principal Component Analysis.
- Author
-
Fuhr, Norbert, Lalmas, Mounia, Malik, Saadia, Kazai, Gabriella, and Zwol, Roelof
- Abstract
In this article an XML-driven approach for multimedia information retrieval is presented and evaluated, which uses principal component analysis to derive a composite ranking for a set of XML elements that have a multimedia character. The multimedia strategies that implement the PCA module on top of the B3-SDR system allow for the integration of image retrieval with the already present text retrieval modules. Three different strategies are defined. The first strategy implements annotation-based image retrieval, which uses the caption of an image to find related images using a keyword-based search. The second component enables content-based multimedia retrieval by using PCA to derive a composite ranking, which reflects the combined relevance for text and images that are present within an XML element. A simple content-based image retrieval system is build for this purpose, which uses ‘query by example'. The last strategy allows for a bidirectional combination of the first two strategies, where the content-based image retrieval component benefits from the additional images retrieved by the annotation-based search, and vice versa. The multimedia strategies are evaluated within the INEX 2005 multimedia track, where based on the Lonelyplanet Worldguide and a set of related topics the retrieval performance is measured in terms of recall and precision. The outcome of the experiment shows that the multimedia strategies have a positive influence on the retrieval performance when compared to the text-based XML retrieval system. However, the PCA component did not yet fully live up to its expectation, which is probably due to the poor performance of the ad hoc build image retrieval system that is used for the experiment. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
7. Combining Image and Structured Text Retrieval.
- Author
-
Fuhr, Norbert, Lalmas, Mounia, Malik, Saadia, Kazai, Gabriella, Iskandar, D. N. F. Awang, Pehcevski, Jovan, Thom, James A., and Tahaghoghi, S. M. M.
- Abstract
Two common approaches in retrieving images from a collection are retrieval by text keywords and retrieval by visual content. However, it is widely recognised that it is impossible for keywords alone to fully describe visual content. This paper reports on the participation of the RMIT University group in the INEX 2005 multimedia track, where we investigated our approach of combining evidence from a content-oriented XML retrieval system and a content-based image retrieval system using a linear combination of evidence. Our approach yielded the best overall result for the INEX 2005 Multimedia track using the standard evaluation measures. We have extended our work by varying the parameter for the linear combination of evidence, and we have also examined the performance of runs submitted by participants by using the newly proposed HiXEval evaluation metric. We show that using CBIR in conjunction with text search leads to better retrieval performance. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
8. Integrating Text Retrieval and Image Retrieval in XML Document Searching.
- Author
-
Fuhr, Norbert, Lalmas, Mounia, Malik, Saadia, Kazai, Gabriella, Tjondronegoro, D., Zhang, J., Gu, J., Nguyen, A., and Geva, S.
- Abstract
Many XML documents contain a mixture of text and images. Images play an important role in webpage or article presentation. However, popular Information Retrieval systems still largely depend on pure text retrieval as it is believed that text descriptions including body text and the caption of images contain precise information. On the other hand, images are more attractive and easier to understand than pure text. We assume that if the image content is used in addition to the pure text-based retrieval, the retrieval result should be better than text-only or image-only retrieval. We test this hypothesis by doing a series of experiments using the Lonely Planet XML document collection. Two search engines, an XML document search engine using both content and structure based on text, and a content-based image search engine were used at the same time. The results generated by these two search engines were merged together to form a new result. This paper presents our current work, initial results and vision into future work. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
9. INEX 2005 Multimedia Track.
- Author
-
Fuhr, Norbert, Malik, Saadia, Zwol, Roelof, Kazai, Gabriella, and Lalmas, Mounia
- Abstract
This paper reports on the activities of the INEX 2005 Multimedia track. The track was successful in realizing its objective to provide a pilot evaluation platform for the evaluation of retrieval strategies for XML-based multimedia documents. In this first exploratory year the focus of the evaluation experiment was to test approaches for the retrieval of XML fragments using a combination of content-based text and image retrieval techniques. The track is set to continue at INEX 2006. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
10. Clustering XML Documents Using Self-organizing Maps for Structures.
- Author
-
Fuhr, Norbert, Lalmas, Mounia, Malik, Saadia, Kazai, Gabriella, Hagenbuchner, M., Sperduti, A., Tsoi, A. C., Trentini, F., Scarselli, F., and Gori, M.
- Abstract
Self-Organizing Maps capable of encoding structured information will be used for the clustering of XML documents. Documents formatted in XML are appropriately represented as graph data structures. It will be shown that the Self-Organizing Maps can be trained in an unsupervised fashion to group XML structured data into clusters, and that this task is scaled in linear time with increasing size of the corpus. It will also be shown that some simple prior knowledge of the data structures is beneficial to the efficient grouping of the XML documents. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
11. Transforming XML Trees for Efficient Classification and Clustering.
- Author
-
Fuhr, Norbert, Lalmas, Mounia, Malik, Saadia, Kazai, Gabriella, Candillier, Laurent, Tellier, Isabelle, and Torre, Fabien
- Abstract
Most of the existing methods we know to tackle datasets of XML documents directly work on the trees representing these XML documents. We investigate in this paper the use of a different kind of representation for the manipulation of XML documents. Our idea is to transform the trees into sets of attribute-values, so as to be able to apply various existing methods of classification and clustering on such data, and benefit from their strengths. We apply this strategy both for the classification task and for the clustering task using the structural description of XML documents alone. For instance, we show that the use of boosted C5 leads to very good results in the classification task of XML documents transformed in this way. The use of SSC in the clustering task benefits from its ability to provide as output an interpretable representation of the clusters found. Finally, we also propose an adaptation of SSC for the classification of XML documents, so that the produced classifier is understandable. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
12. Sequential Pattern Mining for Structure-Based XML Document Classification.
- Author
-
Fuhr, Norbert, Lalmas, Mounia, Malik, Saadia, Kazai, Gabriella, Garboni, Calin, Masseglia, Florent, and Trousse, Brigitte
- Abstract
This article presents an original supervised classification technique for XML documents which is based on structure only. Each XML document is viewed as an ordered labeled tree, represented by his tags only. Our method has three steps. After a cleaning step, we characterize each predefined cluster in terms of frequent structural subsequences. Then we classify the XML documents based on the mined patterns of each cluster. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
13. A Flexible Structured-Based Representation for XML Document Mining.
- Author
-
Fuhr, Norbert, Lalmas, Mounia, Malik, Saadia, Kazai, Gabriella, Vercoustre, Anne-Marie, Fegas, Mounir, Gul, Saba, and Lechevallier, Yves
- Abstract
This paper reports on the INRIA group's approach to XML mining while participating in the INEX XML Mining track 2005. We use a flexible representation of XML documents that allows taking into account the structure only or both the structure and content. Our approach consists of representing XML documents by a set of their sub-paths, defined according to some criteria (length, root beginning, leaf ending). By considering those sub-paths as words, we can use standard methods for vocabulary reduction, and simple clustering methods such as k-means. We use an implementation of the clustering algorithm known as dynamic clouds that can work with distinct groups of independent modalities put in separate variables. This is useful in our model since embedded sub-paths are not independent: we split potentially dependant paths into separate variables, resulting in each of them containing independent paths. Experiments with the INEX collections show good results for the structure-only collections, but our approach could not scale well for large structure-and-content collections. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
14. HiXEval: Highlighting XML Retrieval Evaluation.
- Author
-
Fuhr, Norbert, Lalmas, Mounia, Malik, Saadia, Kazai, Gabriella, Pehcevski, Jovan, and Thom, James A.
- Abstract
This paper describes our proposal for an evaluation metric for XML retrieval that is solely based on the highlighted text. We support our decision of ignoring the exhaustivity dimension by undertaking a critical investigation of the two INEX 2005 relevance dimensions. We present a fine grained empirical analysis of the level of assessor agreement of the five topics double-judged at INEX 2005, and show that the agreement is higher for specificity than for exhaustivity. We use the proposed metric to evaluate the INEX 2005 runs for each retrieval strategy of the CO and CAS retrieval tasks. A correlation analysis of the rank orderings obtained by the new metric and two XCG metrics shows that the orderings are strongly correlated, which demonstrates the usefulness of the proposed metric for evaluation of XML retrieval performance. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
15. Relevance Feedback for Structural Query Expansion.
- Author
-
Fuhr, Norbert, Lalmas, Mounia, Malik, Saadia, Kazai, Gabriella, Schenkel, Ralf, and Theobald, Martin
- Abstract
Keyword-based queries are an important means to retrieve information from XML collections with unknown or complex schemas. Relevance Feedback integrates relevance information provided by a user to enhance retrieval quality. For keyword-based XML queries, feedback engines usually generate an expanded keyword query from the content of elements marked as relevant or nonrelevant. This approach that is inspired by text-based IR completely ignores the semistructured nature of XML. This paper makes the important step from pure content-based to structural feedback. It presents two independent approaches that include structural dimensions in a feedback-driven query evaluation: The first approach reranks the result list of a keyword-based search engine, using structural features derived from results with known relevance. The second approach expands a keyword query into a full-fledged content-and-structure query with weighted conditions. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
16. What Do Users Think of an XML Element Retrieval System?
- Author
-
Fuhr, Norbert, Lalmas, Mounia, Malik, Saadia, Kazai, Gabriella, Kamps, Jaap, and Sigurbjörnsson, Börkur
- Abstract
We describe the University of Amsterdam's participation in the INEX 2005 Interactive Track, mainly focusing on a comparative experiment, in which the baseline system Daffodil/HyREX is compared to a home-grown XML element retrieval system (xmlfind). The xmlfind system provides an interface for an XML information retrieval search engine, using an index that contains all the individual XML elements in the IEEE collection. Our main findings are the following. First, test persons show appreciation for both systems, but xmlfind receives higher scores than Daffodil. Second, the interface seems to take the structural dependencies between retrieved elements into account in an appropriate way: although retrieved elements may be overlapping in whole or in part, none of the test persons regarded this as problematic. Third, the general opinion of the test persons on the usefulness of XML retrieval systems was unequivocally positive, and their responses highlight many of the hoped advantages of an XML retrieval system. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
17. The Interactive Track at INEX 2005.
- Author
-
Fuhr, Norbert, Lalmas, Mounia, Kazai, Gabriella, Larsen, Birger, Malik, Saadia, and Tombros, Anastasios
- Abstract
In its second year, the Interactive Track at INEX focused on address-ing some fundamental issues of interactive XML retrieval: is element retrieval useful for searchers, what granularity of elements do searchers find more useful, what applications for element retrieval can be viable in interactive environments, etc.. In addition, the track also expanded by offering an alternative document collection, by including two additional tasks, and by attracting more participating groups: A total of 11 research groups and 119 test persons participated in the three different tasks that were included in the track. In this paper, we describe the main issues that the Interactive Track at INEX 2005 attempts to address and the methodology and tasks that were used in the track. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
18. XML Retrieval Based on Direct Contribution of Query Components.
- Author
-
Fuhr, Norbert, Lalmas, Mounia, Malik, Saadia, Kazai, Gabriella, and Hubert, Gilles
- Abstract
This paper describes the retrieval approach proposed by the SIG/EVI group of the IRIT research centre at INEX'2005. This XML approach is based on direct contribution of the components constituting an information need. This paper focuses on the method evolutions since previous participation to INEX. It describes the official experiments done for each subtasks with the corresponding results and additional unofficial experiments. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
19. Searching XML Documents - Preliminary Work.
- Author
-
Fuhr, Norbert, Lalmas, Mounia, Malik, Saadia, Kazai, Gabriella, Hassler, Marcus, and Bouchachia, Abdelhamid
- Abstract
Structured document retrieval aims at exploiting the structure together with the content of documents to improve retrieval results. Several aspects of traditional information retrieval applied on flat documents have to be reconsidered. These include in particular, document representation, storage, indexing, retrieval, and ranking. This paper outlines the architecture of our system and the adaptation of the standard vector space model to achieve focussed retrieval. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
20. NLPX at INEX 2005.
- Author
-
Fuhr, Norbert, Lalmas, Mounia, Malik, Saadia, Kazai, Gabriella, Woodley, Alan, and Geva, Shlomo
- Abstract
XML information retrieval (XML-IR) systems aim to provide users with highly exhaustive and highly specific results. To interact with XML-IR systems users must express both their content and structural needs in the form of a structured query. Historically, these structured queries have been formatted using formal languages such as XPath or NEXI. Unfortunately, formal query languages are very complex and too difficult to be used by experienced, let alone casual, users and are too closely bound to the underlying physical structure of the collection. Hence, recent research has investigated the idea of specifying users' content and structural requirements via natural language queries (NLQs). The NLP track was established at INEX 2004 to promote research into this area, and QUT participated with the system NLPX. Here, we discuss changes we've made to the system since last year, as well as our participation in INEX 2005. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
21. Machine Learning Ranking and INEX'05.
- Author
-
Fuhr, Norbert, Lalmas, Mounia, Malik, Saadia, Kazai, Gabriella, Vittaut, Jean-Noël, and Gallinari, Patrick
- Abstract
We present a Machine Learning based ranking model which can automatically learn its parameters using a training set of annotated examples composed of queries and relevance judgments on a subset of the document elements. Our model improves the performance of a baseline Information Retrieval system by optimizing a ranking loss criterion and combining scores computed from doxels and from their local structural context. We analyze the performance of our algorithm on CO-Focussed and CO-Thourough tasks and compare it to the baseline model which is an adaptation of Okapi to Structured Information Retrieval. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
22. SIRIUS: A Lightweight XML Indexing and Approximate Search System at INEX 2005.
- Author
-
Fuhr, Norbert, Lalmas, Mounia, Malik, Saadia, Kazai, Gabriella, Popovici, Eugen, Ménier, Gildas, and Marteau, Pierre-François
- Abstract
This paper reports on SIRIUS, a lightweight indexing and search engine for XML documents. The retrieval approach implemented is document oriented. It involves an approximate matching scheme of the structure and textual content. Instead of managing the matching of whole DOM trees, SIRIUS splits the documents object model in a set of paths. In this view, the request is a path-like expression with conditions on the attribute values. In this paper, we present the main functionalities and characteristics of this XML IR system and second we relate on our experience on adapting and using it for the INEX 2005 ad-hoc retrieval task. Finally, we present and analyze the SIRIUS retrieval performance obtained during the INEX 2005 evaluation campaign and show that despite the lightweight characteristics of SIRIUS we were able to retrieve highly relevant non overlapping XML elements and obtained quite good precision at low recall values. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
23. RMIT University at INEX 2005: Ad Hoc Track.
- Author
-
Fuhr, Norbert, Lalmas, Mounia, Malik, Saadia, Kazai, Gabriella, Pehcevski, Jovan, Thom, James A., and Tahaghoghi, S. M. M.
- Abstract
Different scenarios of XML retrieval are analysed in the INEX 2005 ad hoc track, which reflect different query interpretations and user behaviours that may be observed during XML retrieval. The RMIT University group's participation in the INEX 2005 ad hoc track investigates these XML retrieval scenarios. Our runs follow a hybrid XML retrieval approach that combines three information retrieval models with two ways of identifying the appropriate element granularity and two XML-specific heuristics to rank the final answers. We observe different behaviours when applying our hybrid approach to the different retrieval scenarios, suggesting that the optimal retrieval parameters are highly dependent on the nature of the XML retrieval task. Importantly, we show that using structural hints in content only topics is a useful feature that leads to more precise search, but only when level of overlap among the retrieved elements is considered by the evaluation metric. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
24. When a Few Highly Relevant Answers Are Enough.
- Author
-
Fuhr, Norbert, Lalmas, Mounia, Malik, Saadia, Kazai, Gabriella, and Lehtonen, Miro
- Abstract
Our XML retrieval system EXTIRP was slightly modified from the 2004 version for the INEX 2005 project. For the first time, the system is now completely independent of the document type of the XML documents in the collection, which justifies the use of the term "heterogeneous" when describing our methodology. Nevertheless, the 2005 version of EXTIRP is still an incomplete system that does not include query expansion or dynamic determination of the answer size. The latter is seen as a serious limitation because of the XCG-based metrics which favour systems that can adjust the size of the answer according to its relevance to the query. We put our main focus on the CO.Focussed task of the adhoc track although runs were submitted for other tasks, as well. Perhaps because of the incompleteness of our system, the initial results bring out the characteristics of our system better than in earlier years. Even when partially stripped, EXTIRP is capable of ranking the most obvious highly relevant answers at the top ranks better than many other systems. The relatively high precision at the top ranks is achieved at the cost of losing the sight of the marginally relevant content, which shows in some exceptionally steep curves, and the rankings among other systems that sink from the top ranks at low recall levels towards the bottom ranks at higher levels of recall. Another fact supporting our observation is that regardless of the metric, our runs are ranked higher with the strict quantisation than with any other quantisation function. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
25. TopX and XXL at INEX 2005.
- Author
-
Fuhr, Norbert, Lalmas, Mounia, Malik, Saadia, Kazai, Gabriella, Theobald, Martin, Schenkel, Ralf, and Weikum, Gerhard
- Abstract
We participated with two different and independent search engines in this year's INEX round: The XXL Search Engine and the TopX engine. As this is the first participation for TopX, this paper focuses on the design principles, scoring, query evaluation and results of TopX. We shortly discuss the results with XXL afterwards. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
26. The Dynamic Retrieval of XML Elements.
- Author
-
Fuhr, Norbert, Lalmas, Mounia, Malik, Saadia, Kazai, Gabriella, Crouch, Carolyn J., Khanna, Sudip, Potnis, Poorva, and Doddapaneni, Nagendra
- Abstract
This paper describes the current state of our system for structured retrieval. The system itself is based on an extension of the vector space model initially proposed by Fox [5]. The basic functions are performed using the Smart experimental retrieval system [10]. The major advance in our system this year is the incorporation of a facility for the dynamic retrieval of elements, which we refer to as flexible retrieval. This approach allows the system to return a rank-ordered list of elements based on a single indexing of the collection at the paragraph level.Lnu term weights [12,13] are generated dynamically along with the elements themselves, thus eliminating the need for propagation. Experimental results using this technique on INEX 2006 data show that it can produce results competitive with those produced by retrieval on an all-element index of the collection (and in fact produces virtually identical results for the new Fetch-and-Browse task). Early relevance feedback results are also reported. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
27. XFIRM at INEX 2005: Ad-Hoc and Relevance Feedback Tracks.
- Author
-
Fuhr, Norbert, Lalmas, Mounia, Malik, Saadia, Kazai, Gabriella, Sauvagnat, Karen, Hlaoua, Lobna, and Boughanem, Mohand
- Abstract
This paper describes experiments carried out with the XFIRM system in the INEX 2005 framework. The XFIRM system uses a relevance propagation method to answer CO and CAS queries. Runs were submitted to the ad-hoc and relevance feedback tracks. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
28. GPX - Gardens Point XML IR at INEX 2005.
- Author
-
Fuhr, Norbert, Lalmas, Mounia, Malik, Saadia, Kazai, Gabriella, and Geva, Shlomo
- Abstract
The INEX 2005 evaluation consisted of numerous tasks that required different approaches. In this paper we described the approach that we adopted to satisfy the requirements of all the tasks, CAS and CO, in Thorough, Focused, and Fetch Browse mode, using the same underlying system The retrieval approach is based on the construction of a collection sub-tree, consisting of all nodes that contain one or more of the search terms. Nodes containing search terms are then assigned a score using a TF_IDF variant, scores are propagated upwards in the document XML tree, and finally all XML elements are ranked. We present results that demonstrate that the approach is versatile and produces consistently good performance across all INEX 2005 tasks. Keywords: XML Information Retrieval, XML Search Engine, Inverted Files, XML-IR, Focused retrieval. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
29. Probabilistic Retrieval, Component Fusion and Blind Feedback for XML Retrieval.
- Author
-
Fuhr, Norbert, Lalmas, Mounia, Malik, Saadia, Kazai, Gabriella, and Larson, Ray R.
- Abstract
This paper describes the retrieval approaches used by UC Berkeley in our official submissions for the various Adhoc tasks. As in previous INEX evaluations, the main technique we are testing is the fusion of multiple probabilistic searches against different XML components using different probabilistic retrieval algorithms. In addition this year we began to use a different fusion/combination method from previous years. This year we also continued to use re-estimated Logistic Regression (LR) parameters for different components of the IEEE document collection, estimated using relevance judgements from the INEX 2003 evaluation. All of our runs were fully automatic with no manual editing or interactive submission of queries, and all used only the title elements of the INEX topics. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
30. Parameter Estimation for a Simple Hierarchical Generative Model for XML Retrieval.
- Author
-
Fuhr, Norbert, Lalmas, Mounia, Malik, Saadia, Kazai, Gabriella, Ogilvie, Paul, and Callan, Jamie
- Abstract
This paper explores the possibility of using a modified Expectation-Maximization algorithm to estimate parameters for a simple hierarchical generative model for XML retrieval. The generative model for an XML element is estimated by linearly interpolating statistical language models estimated from the text of the element, the parent element, the document element, and its children elements. We heuristically modify EM to allow the incorporation of negative examples, then attempt to maximize the likelihood of the relevant components while minimizing the likelihood of non-relevant components found in training data. The technique for incorporation of negative examples provide an effective algorithm to estimate the parameters in the linear combination mentioned. Some experiments are presented on the CO.Thorough task that support these claims. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
31. The University of Kaiserslautern at INEX 2005.
- Author
-
Fuhr, Norbert, Lalmas, Mounia, Malik, Saadia, Kazai, Gabriella, and Dopichaj, Philipp
- Abstract
Digital libraries offer convenient access to large volumes of text, but finding the information that is relevant for a given information need is hard. The workshops of the Initiative for the Evaluation of XML retrieval (INEX) provide a forum for testing the effectiveness of retrieval strategies. In this paper, we present the two strategies used by the University of Kaiserslautern at INEX 2005: The first method uses background knowledge about the document schema (element relationships) to support queries with structural constraints. The second method exploits structural patterns in the retrieval results to find the appropriate results among overlapping elements. In the evaluation of the official results from the workshop, we find that element relationships does not improve retrieval quality for the test collection, but that patterns can lead to improved early precision. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
32. Using the INEX Environment as a Test Bed for Various User Models for XML Retrieval.
- Author
-
Fuhr, Norbert, Lalmas, Mounia, Malik, Saadia, Kazai, Gabriella, Mass, Yosi, and Mandelbrod, Matan
- Abstract
While in previous INEX workshops, XML retrieval tasks were divided roughly to CO (Content Only) and CAS (Content and Structure) tasks, the focus this year was to further refine those tasks so as to experiment with different user behaviors for viewing returned results. In particular interest is the new "Focussed" task that permits a single element along each path, thus solving the problem of XML result overlapping that we experimented in previous INEX workshops. In this paper we describe an algorithm for the new "Focussed" task as well as our algorithms and approaches for the other tasks. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
33. EPRUM Metrics and INEX 2005.
- Author
-
Fuhr, Norbert, Lalmas, Mounia, Malik, Saadia, Kazai, Gabriella, and Piwowarski, Benjamin
- Abstract
Standard Information Retrieval (IR) metrics are not well suited for new paradigms like XML IR in which retrievable information units are document elements. These units are neither predefined nor independent, and the elements returned by IR systems may overlap and contain near misses. Part of the problem stems from the classical hypotheses on the user behaviour that do not take into account the structural or logical context of document elements or the possibility of navigation between retrievable units. The Expected Precision Recall with User Model (EPRUM) metric is based on a more realistic user model which encompasses a large variety of user behaviours. In this paper, we present the EPRUM metric used for evaluating the official submissions of INEX 2005 and detail the settings we used. We do not present the full derivation of the EPRUM metric but we give a thorough example of its computation along with the complete set of formulas needed to compute precision at different recall values. We also discuss the implication of such a metric on several key problems of XML Information Retrieval as the notion of the ideal list and the problem of the overlap. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
34. Field-Weighted XML Retrieval Based on BM25.
- Author
-
Fuhr, Norbert, Lalmas, Mounia, Malik, Saadia, Kazai, Gabriella, Lu, Wei, Robertson, Stephen, and MacFarlane, Andrew
- Abstract
This is the first year for the Centre for Interactive Systems Research participation of INEX. Based on a newly developed XML indexing and retrieval system on Okapi, we extend Robertson's field-weighted BM25F for document retrieval to element level retrieval function BM25E. In this paper, we introduce this new function and our experimental method in detail, and then show how we tuned weights for our selected fields by using INEX 2004 topics and assessments. Based on the tuned models we submitted our runs for CO.Thorough, CO.FetchBrowse, the methods we propose show real promise. Existing problems and future work are also discussed. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
35. B3-SDR and Effective Use of Structural Hints.
- Author
-
Fuhr, Norbert, Lalmas, Mounia, Malik, Saadia, Kazai, Gabriella, and Zwol, Roelof
- Abstract
The focus in this article is on the use of structural hints to increase the retrieval performance of models for structured document retrieval. Based on an effective model for structured document retrieval for ‘content only' queries, two extensions are defined that allow the retrieval model to include structural hints provided by the user into the retrieval process. The underlying hypothesis states that if the user is capable of providing structural clues, besides the content-based criteria of his/her information need, the retrieval performance can be increased. To test this hypothesis the two extensions are evaluated using a selection of the retrieval tasks defined for the INEX 2005 Ad-hoc track. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
36. Query Evaluation with Structural Indices.
- Author
-
Fuhr, Norbert, Lalmas, Mounia, Malik, Saadia, Kazai, Gabriella, Arvola, Paavo, Kekäläinen, Jaana, and Junkkari, Marko
- Abstract
This paper describes the retrieval methods of TRIX system based on structural indices utilizing the natural tree structure of XML. We show how these indices can be employed in the processing of CO as well as CAS queries, which makes it easy for variations of CAS queries to be processed. Results at INEX 2005 are discussed including the following tasks: CO.Focussed, CO.FetchBrowse, CO.Thorough and all of the CAS tasks. While creating result lists, two different overlapping models have been applied according to task. The weights of the ancestors of an element have been taken into account in re-weighting in order to get more evidence about relevance. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
37. INEX 2005 Evaluation Measures.
- Author
-
Fuhr, Norbert, Malik, Saadia, Kazai, Gabriella, and Lalmas, Mounia
- Abstract
This paper describes the official measures of retrieval effectiveness employed in INEX 2005: the eXtended Cumulated Gain (XCG) measures. In addition, results of correlation analysis are reported, examining the correlation between the employed quantisation functions and the different measures for the INEX 2005 ad-hoc tasks. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
38. The Effect of Structured Queries and Selective Indexing on XML Retrieval.
- Author
-
Fuhr, Norbert, Lalmas, Mounia, Malik, Saadia, Kazai, Gabriella, Sigurbjörnsson, Börkur, and Kamps, Jaap
- Abstract
We describe the University of Amsterdam's participation in the INEX 2005 ad hoc track, covering the Thorough, Focused, and FetchBrowse tasks and their structured (+S) counterparts. Our research questions for this round of INEX were threefold. Our first and main research question was to investigate the contribution of structural constraints to improved retrieval performance. Our main results were that the two types of structural constraints have different effects. Constraining the target of result elements gives improvements in terms of early precision. Constraining the context of result elements improves mean average precision. Our second research question was to experiment with selective indexing strategies based on either the length of elements, the tag-name of elements considered relevant in earlier INEX years, or simply by indexing all sections or articles. Our experiments show that disregarding 80-90% of the total number of elements does not decrease retrieval performance. Third, we considered the automatic creation of structured queries using blind feedback. Here, our results are inconclusive, mainly due to few queries used and lack of comparison to traditional blind feedback. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
39. TIJAH Scratches INEX 2005: Vague Element Selection, Image Search, Overlap, and Relevance Feedback.
- Author
-
Fuhr, Norbert, Lalmas, Mounia, Malik, Saadia, Kazai, Gabriella, Mihajlović, Vojkan, Ramírez, Georgina, Westerveld, Thijs, Hiemstra, Djoerd, Blok, Henk Ernst, and Vries, Arjen P.
- Abstract
Retrieving information from heterogeneous data sources in a flexible manner and within a single (database) framework is still a challenge. In this paper we present several extensions of our prototype database system TIJAH developed for structured retrieval. The extensions are aimed at modeling vague selection of XML elements and image retrieval. All three levels (conceptual, logical, and physical) of the TIJAH system are enhanced to support the extensions. Additionally, we analyze different ways of removing overlap and explain how structural information can be used for relevance feedback. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
40. The Interpretation of CAS.
- Author
-
Fuhr, Norbert, Malik, Saadia, Kazai, Gabriella, Trotman, Andrew, and Lalmas, Mounia
- Abstract
There has been much debate over how to interpret the structure in queries that contain structural hints. At INEX 2003 and 2004, there were two interpretations: SCAS in which the user specified target element was interpreted strictly, and VCAS in which it was interpreted vaguely. But how many ways are there that the query could be interpreted? In the investigation at INEX 2005 (discussed herein) four different interpretations were proposed, and compared on the same queries. Those interpretations (SSCAS, SVCAS, VSCAS, and VVCAS) are the four interpretations possible by interpreting the target elements, and the support elements, either strictly or vaguely. An analysis of the submitted runs shows that those that share an interpretation of the target element correlate - that is, the previous decision to divide CAS into the SCAS and VCAS (as done at INEX 2003 and 2004) was sound. The analysis is supported by the fact that the best performing VSCAS run was submitted to the VVCAS task and the best performing SVCAS run was submitted to the SSCAS task. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
41. Overview of INEX 2005.
- Author
-
Malik, Saadia, Kazai, Gabriella, Lalmas, Mounia, and Fuhr, Norbert
- Abstract
Since 2002, INEX has been working towards the goal of establishing an infrastructure, in the form of a large XML test collection and appropriate scoring methods, for the evaluation of content-oriented XML retrieval systems. This paper provides an overview of the work carried out as part of INEX 2005. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.