Descriptor: "INFORMATION retrieval research" / Topic: data mining - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"INFORMATION retrieval research"' showing total 22 results

Start Over Descriptor "INFORMATION retrieval research" Topic data mining

22 results on '"INFORMATION retrieval research"'

1. Building subject-aligned comparable corpora and mining it for truly parallel sentence pairs.

Author: Wołk, Krzysztof and Marasek, Krzysztof
Subjects: SENTENCES (Grammar), CORPORA, DATA mining, INFORMATION retrieval research, MACHINE translating
Abstract: Parallel sentences are a relatively scarce but extremely useful resource for many applications including cross-lingual retrieval and statistical machine translation. This research explores our methodology for mining such data from previously obtained comparable corpora. The task is highly practical since non-parallel multilingual data exist in far greater quantities than parallel corpora, but parallel sentences are a much more useful resource. Here we propose a web crawling method for building subjectaligned comparable corpora from Wikipedia articles. We also introduce a method for extracting truly parallel sentences that are filtered out from noisy or just comparable sentence pairs. We describe our implementation of a specialized tool for this task as well as training and adaption of a machine translation system that supplies our filter with additional information about the similarity of comparable sentence pairs. [ABSTRACT FROM AUTHOR]
Published: 2014
Full Text: View/download PDF

2. DeepMeSH: deep semantic representation for improving large-scale MeSH indexing.

Author: Shengwen Peng, Ronghui You, Hongning Wang, Chengxiang Zhai, Hiroshi Mamitsuka, and Shanfeng Zhu
Subjects: *INFORMATION retrieval research, *DATA mining, *MEDICAL subject headings, *SUBJECT headings, *MEDICAL informatics
Abstract: Motivation: Medical Subject Headings (MeSH) indexing, which is to assign a set of MeSH main headings to citations, is crucial for many important tasks in biomedical text mining and information retrieval. Large-scale MeSH indexing has two challenging aspects: the citation side and MeSH side. For the citation side, all existing methods, including Medical Text Indexer (MTI) by National Library of Medicine and the state-of-the-art method, MeSHLabeler, deal with text by bag-of-words, which cannot capture semantic and context-dependent information well. Methods: We propose DeepMeSH that incorporates deep semantic information for large-scale MeSH indexing. It addresses the two challenges in both citation and MeSH sides. The citation side challenge is solved by a new deep semantic representation, D2V-TFIDF, which concatenates both sparse and dense semantic representations. The MeSH side challenge is solved by using the 'learning to rank' framework of MeSHLabeler, which integrates various types of evidence generated from the new semantic representation. Results: DeepMeSH achieved a Micro F-measure of 0.6323, 2% higher than 0.6218 of MeSHLabeler and 12% higher than 0.5637 of MTI, for BioASQ3 challenge data with 6000 citations. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

3. Modality-Dependent Cross-Media Retrieval.

Author: Wei, Yunchao, Zhao, Yao, Zhu, Zhenfeng, Wei, Shikui, Xiao, Yanhui, Feng, Jiashi, and Yan, Shuicheng
Subjects: *INFORMATION retrieval research, *CONTENT-based image retrieval, *TEXT mining, *DATA mining, *METADATA mapping
Abstract: In this article, we investigate the cross-media retrieval between images and text, that is, using image to search text (I2T) and using text to search images (T2I). Existing cross-media retrieval methods usually learn one couple of projections, by which the original features of images and text can be projected into a common latent space to measure the content similarity. However, using the same projections for the two different retrieval tasks (I2T and T2I) may lead to a tradeoff between their respective performances, rather than their best performances. Different from previous works, we propose a modality-dependent cross-media retrieval (MDCR) model, where two couples of projections are learned for different cross-media retrieval tasks instead of one couple of projections. Specifically, by jointly optimizing the correlation between images and text and the linear regression from one modal space (image or text) to the semantic space, two couples of mappings are learned to project images and text from their original feature spaces into two common latent subspaces (one for I2T and the other for T2I). Extensive experiments show the superiority of the proposed MDCR compared with other methods. In particular, based on the 4,096-dimensional convolutional neural network (CNN) visual feature and 100-dimensional Latent Dirichlet Allocation (LDA) textual feature, the mAP of the proposed method achieves the mAP score of 41.5&percnt;, which is a new state-of-the-art performance on the Wikipedia dataset. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

4. An automated approach to quantifying functional interactions by mining large-scale product specification data.

Author: Kang, Sung Woo and Tucker, Conrad
Subjects: *TEXT mining, *DATA mining, *INFORMATION retrieval research, *PRODUCT design, *INDUSTRIAL design
Abstract: The authors of this work hypothesise that the semantic relationship between modules' functional descriptions is correlated with the functional interaction between the modules. A deeper comprehension of the functional interactions between modules enables designers to integrate complex systems during the early stages of the product design process. Existing approaches that measure functional interactions between modules rely on the manual provision of designers' expert analyses, which may be time consuming and costly. The increased quantity and complexity of products in the twenty-first century further exacerbates these challenges. This work proposes an approach to automatically quantify the functional interactions between modules, based on their textual technical descriptions. Compared with manual analyses by design experts who use traditional design structure matrix approaches, the text-mining-driven methodology discovers similar functional interactions, while maintaining comparable accuracies. The case study presented in this work analyses an automotive climate control system and compares the functional interaction solutions achieved by a traditional design team with those achieved following the methodology outlined in this paper. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

5. Latent entity space: a novel retrieval approach for entity-bearing queries.

Author: Liu, Xitong and Fang, Hui
Subjects: *INFORMATION retrieval research, *DATA mining, *AUTOMATIC extracting (Information science), *DATA science, *DATABASE searching
Abstract: Analysis on Web search query logs has revealed that there is a large portion of entity-bearing queries, reflecting the increasing demand of users on retrieving relevant information about entities such as persons, organizations, products, etc. In the meantime, significant progress has been made in Web-scale information extraction, which enables efficient entity extraction from free text. Since an entity is expected to capture the semantic content of documents and queries more accurately than a term, it would be interesting to study whether leveraging the information about entities can improve the retrieval accuracy for entity-bearing queries. In this paper, we propose a novel retrieval approach, i.e., latent entity space (LES), which models the relevance by leveraging entity profiles to represent semantic content of documents and queries. In the LES, each entity corresponds to one dimension, representing one semantic relevance aspect. We propose a formal probabilistic framework to model the relevance in the high-dimensional entity space. Experimental results over TREC collections show that the proposed LES approach is effective in capturing latent semantic content and can significantly improve the search accuracy of several state-of-the-art retrieval models for entity-bearing queries. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

6. An overview on the exploitation of time in collaborative filtering.

Author: Vinagre, João, Jorge, Alípio Mário, and Gama, João
Subjects: *INFORMATION filtering, *INFORMATION retrieval research, *DATA mining, *ALGORITHMS, *DATA science, *MATHEMATICAL models
Abstract: Classic Collaborative Filtering (CF) algorithms rely on the assumption that data are static and we usually disregard the temporal effects in natural user-generated data. These temporal effects include user preference drifts and shifts, seasonal effects, inclusion of new users, and items entering the system-and old ones leaving-user and item activity rate fluctuations and other similar time-related phenomena. These phenomena continuously change the underlying relations between users and items that recommendation algorithms essentially try to capture. In the past few years, a new generation of CF algorithms has emerged, using the time dimension as a key factor to improve recommendation models. In this overview, we present a comprehensive analysis of these algorithms and identify important challenges to be faced in the near future. WIREs Data Mining Knowl Discov 2015, 5:195-215. doi: 10.1002/widm.1160 For further resources related to this article, please visit the . [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

7. Effective Filtering of Query Results on Updated User Behavioral Profiles in Web Mining.

Author: Sadesh, S. and Suganthe, R. C.
Subjects: AUTOMATIC extracting (Information science), DATA mining, INFORMATION retrieval research, INFORMATION services research, SEARCH engines
Abstract: Web with tremendous volume of information retrieves result for user related queries. With the rapid growth of web page recommendation, results retrieved based on data mining techniques did not offer higher performance filtering rate because relationships between user profile and queries were not analyzed in an extensive manner. At the same time, existing user profile based prediction in web data mining is not exhaustive in producing personalized result rate. To improve the query result rate on dynamics of user behavior over time, Hamilton Filtered Regime Switching User Query Probability (HFRS-UQP) framework is proposed. HFRS-UQP framework is split into two processes, where filtering and switching are carried out. The data mining based filtering in our research work uses the Hamilton Filtering framework to filter user result based on personalized information on automatic updated profiles through search engine. Maximized result is fetched, that is, filtered out with respect to user behavior profiles. The switching performs accurate filtering updated profiles using regime switching. The updating in profile change (i.e., switches) regime in HFRS-UQP framework identifies the second- and higher-order association of query result on the updated profiles. Experiment is conducted on factors such as personalized information search retrieval rate, filtering efficiency, and precision ratio. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

8. Combining hybrid rule ordering strategies based on netconf and a novel satisfaction mechanism for CAR-based classifiers.

Author: Hernández-León, R., Carrasco-Ochoa, Jesús A., Martínez-Trinidad, José Fco., and Hernández-Palancar, J.
Subjects: *IMAGE retrieval, *INFORMATION retrieval research, *IMAGE storage & retrieval systems, *BIG data, *DATA mining
Abstract: In Associative Classification, building a classifier based on Class Association Rules (CARs) consists in finding an ordered CAR list by applying a rule ordering strategy, and selecting a satisfaction mechanism to determine the class of unseen transactions. In this paper, we introduce four novel hybrid rule ordering strategies; the first three combine the Netconf measure with different Support-Confidence based rule ordering strategies. The fourth strategy combines the Netconf measure with a rule ordering strategy based on the CAR's size. Additionally, we combine the proposed strategies with a novel "Dynamic K" satisfaction mechanism. Experiments over several datasets show that the proposed rule ordering strategies jointly with the "Dynamic K" satisfaction mechanism allow improving the performance of CAR-based classifiers. [ABSTRACT FROM AUTHOR]
Published: 2014
Full Text: View/download PDF

9. EXPLAINING DATA-DRIVEN DOCUMENT CLASSIFICATIONS.

Author: Martens, David and Provost, Foster
Subjects: *DOCUMENT classification (Electronic documents), *TEXT mining, *DATA mining, *INFORMATION retrieval research, *DATA quality, *BUSINESS models
Abstract: Many document classification applications require human understanding of the reasons for data-driven classification decisions by managers, client-facing employees, and the technical team. Predictive models treat documents as data to be classified, and document data are characterized by very high dimensionality, often with tens of thousands to millions of variables (words). Unfortunately, due to the high dimensionality, understanding the decisions made by document classifiers is very difficult. This paper begins by extending the most relevant prior theoretical model of explanations for intelligent systems to account for some missing elements. The main theoretical contribution is the definition of a new sort of explanation as a minimal set of words (terms, generally), such that removing all words within this set from the document changes the predicted class from the class of interest. We present an algorithm to find such explanations, as well as a framework to assess such an algorithm's performance. We demonstrate the value of the new approach with a case study from a real-world document classification task: classifying web pages as containing objectionable content, with the goal of allowing advertisers to choose not to have their ads appear on those pages. A second empirical demonstration on news-story topic classification shows the explanations to be concise and document-specific, and to be capable of providing understanding of the exact reasons for the classification decisions, of the workings of the classification models, and of the business application itself. We also illustrate how explaining the classifications of documents can help to improve data quality and model performance. [ABSTRACT FROM AUTHOR]
Published: 2014

10. LEVERAGING PHILANTHROPIC BEHAVIOR FOR CUSTOMER SUPPORT: THE CASE OF USER SUPPORT FORUMS.

Author: Jabr, Wael, Mookerjee, Radha, Yong Tan, and Mookerjee, Vijay S.
Subjects: *CUSTOMER relations, *TEXT mining, *INFORMATION technology research, *RESEARCH on Internet users, *DATA mining, *INFORMATION retrieval research
Abstract: Online user forums for technical support are being widely adopted by IT firms to supplement traditional customer support channels. Customers benefit from having an additional means of product support, while firms benefit by lowering the costs of supporting a large customer base. Typically these forums are populated with content generated by users, consisting of questioners (solution seekers) and solvers (solution providers). While questioners can be expected to keep returning as long as they can find answers, firms must employ different means in order to recognize and encourage the contributions of solvers. We identify and compare the impact of two widely adopted recognition mechanisms on the philanthropic behavior of solvers. In the first mechanism, feedback-based recognition, solver contribution is evaluated by questioners. In the second mechanism, quantity-based recognition, all contributions are weighted equally regardless of questioner feedback. We draw on the pro-social behavior literature to identify four drivers of solver contribution: (1) peer recognition, (2) image motivation, (3) social comparison, and (4) social exposure. We show that the choice of recognition mechanism strongly influences a solver's problem-solving behavior, highlighting the importance of the firm's decision in this regard. We address issues of solvers self-selecting a type of recognition mechanism by using propensity score analysis in order to show that solver behavior is a result of forum conditioning. We also study the impact of the recognition mechanism on forum quality and the effectiveness of support to draw comparative analytics. [ABSTRACT FROM AUTHOR]
Published: 2014

11. Searching Dimension Incomplete Databases.

Author: Cheng, Wei, Jin, Xiaoming, Sun, Jian-Tao, Lin, Xuemin, Zhang, Xiang, and Wang, Wei
Subjects: *MISSING data (Statistics), *DATABASES, *QUERY (Information retrieval system), *DATA mining, *INFORMATION retrieval research, *DIMENSIONS, *ACQUISITION of data
Abstract: Similarity query is a fundamental problem in database, data mining and information retrieval research. Recently, querying incomplete data has attracted extensive attention as it poses new challenges to traditional querying techniques. The existing work on querying incomplete data addresses the problem where the data values on certain dimensions are unknown. However, in many real-life applications, such as data collected by a sensor network in a noisy environment, not only the data values but also the dimension information may be missing. In this work, we propose to investigate the problem of similarity search on dimension incomplete data. A probabilistic framework is developed to model this problem so that the users can find objects in the database that are similar to the query with probability guarantee. Missing dimension information poses great computational challenge, since all possible combinations of missing dimensions need to be examined when evaluating the similarity between the query and the data objects. We develop the lower and upper bounds of the probability that a data object is similar to the query. These bounds enable efficient filtering of irrelevant data objects without explicitly examining all missing dimension combinations. A probability triangle inequality is also employed to further prune the search space and speed up the query process. The proposed probabilistic framework and techniques can be applied to both whole and subsequence queries. Extensive experimental results on real-life data sets demonstrate the effectiveness and efficiency of our approach. [ABSTRACT FROM PUBLISHER]
Published: 2014
Full Text: View/download PDF

12. A hybrid approach to Arabic named entity recognition.

Author: Shaalan, Khaled and Oudah, Mai
Subjects: *MACHINE learning, *INFORMATION retrieval research, *NATURAL language processing, *ARABIC language, *DATA mining
Abstract: In this paper, we propose a hybrid named entity recognition (NER) approach that takes the advantages of rule-based and machine learning-based approaches in order to improve the overall system performance and overcome the knowledge elicitation bottleneck and the lack of resources for underdeveloped languages that require deep language processing, such as Arabic. The complexity of Arabic poses special challenges to researchers of Arabic NER, which is essential for both monolingual and multilingual applications. We used the hybrid approach to develop an Arabic NER system that is capable of recognizing 11 types of Arabic named entities: Person, Location, Organization, Date, Time, Price, Measurement, Percent, Phone Number, ISBN and File Name. Extensive experiments were conducted using decision trees, Support Vector Machines and logistic regression classifiers to evaluate the system performance. The empirical results indicate that the hybrid approach outperforms both the rule-based and the ML-based approaches when they are processed independently. More importantly, our system outperforms the state-of-the-art of Arabic NER in terms of accuracy when applied to ANERcorp standard dataset, with F-measures 0.94 for Person, 0.90 for Location and 0.88 for Organization. [ABSTRACT FROM PUBLISHER]
Published: 2014
Full Text: View/download PDF

13. A Critical Review of K Means Text Clustering Algorithms.

Author: Kwale, Francis Musembi
Subjects: TEXT mining, DATA mining, INFORMATION retrieval research, DOCUMENT clustering, ELECTRONIC file management
Abstract: Text clustering is a text mining technique used to group text documents into groups (or clusters) based on similarity of content. This organization (i.e. clustering) is so as to make documents more understandable and easier to search the relevant information, easier to process, and even more efficient in utilizing communication bandwidth and storage space. An example is clustering results of a web search engine operation into groups of similar documents. Many text clustering algorithms have been developed using different approaches, but none can be said to be the best. The choice of a particular algorithm is a big issue to text clustering system developers. K Means is arguably the most popular text clustering algorithm. However, just like the others, it must be having its own weaknesses. In this paper, we explore the K Means algorithm as well as its variants and discuss their appropriateness in text clustering. We describe the characteristics of the algorithms accompanied by some examples and illustrations in an attempt to discover the strengths and weaknesses. The paper thus gives an in depth view of the K Means algorithms, discusses the appropriateness of the algorithms, and also gives guidance to researchers of text mining concerning the choice of K Means for text clustering. [ABSTRACT FROM AUTHOR]
Published: 2013

14. A Survey on Preprocessing in Text Mining.

Author: Anadakumar, K. and Padmavathy, V.
Subjects: DATABASE management, TEXT mining, DATA mining, INFORMATION retrieval research, DATABASE searching
Abstract: Now-a-days information's are stored electronically in databases. Extracting reliable, unknown and useful information from the abundant source is an eminent task. Data mining and Text mining are the process for extracting unknown and useful information. Text Mining is the process of extracting interesting and non-trivial patterns or knowledge from text documents. This paper presents the related activities and focuses on preprocessing steps in text mining. [ABSTRACT FROM AUTHOR]
Published: 2013

15. A longitudinal study of HotMap web search.

Author: Hoeber, Orland
Subjects: *WEB search engines, *SEARCH engines, *INTERNET searching, *INFORMATION retrieval research, *DATA mining, *INFORMATION services research
Abstract: Purpose – HotMap web search was designed to support exploratory search tasks by adding lightweight visual and interactive features to the commonly used list-based representation of web search results. Although laboratory user studies are the most common method for empirically validating the utility of information visualization and information retrieval systems such as this, it is difficult to determine if such studies accurately reflect the tasks of real users. This paper aims to address these issues. Design/methodology/approach – A longitudinal user evaluation was conducted in two phases over a ten-week period to determine how this novel web search interface was being used and accepted in real-world settings. Findings – Although the interactive features were not used as extensively as expected, there is evidence that the participants did find them useful. Participants were able to refine their queries easily, although most did so manually. Those that used the interactive exploration features were able to effectively discover potentially relevant documents buried deep in the search results list. Subjective reactions regarding the usefulness and ease-of-use of the system were positive, and more than half of the participants continued to use the system even after the study ended. Originality/value – As a result of conducting this longitudinal study, the author has gained a deeper understanding of how a particular visual and interactive web search interface is being used in the real world, as well as issues associated with resistance to change. These findings may provide guidance for the design, development, and study of next generation interfaces for online information retrieval. [ABSTRACT FROM AUTHOR]
Published: 2013
Full Text: View/download PDF

16. Rotation-invariant similarity in time series using bag-of-patterns representation.

Author: Lin, Jessica, Khade, Rohan, and Li, Yuan
Subjects: TIME series analysis, DATA mining, HISTOGRAMS, TEXT mining, INFORMATION retrieval research
Abstract: For more than a decade, time series similarity search has been given a great deal of attention by data mining researchers. As a result, many time series representations and distance measures have been proposed. However, most existing work on time series similarity search relies on shape-based similarity matching. While some of the existing approaches work well for short time series data, they typically fail to produce satisfactory results when the sequence is long. For long sequences, it is more appropriate to consider the similarity based on the higher-level structures. In this work, we present a histogram-based representation for time series data, similar to the 'bag of words' approach that is widely accepted by the text mining and information retrieval communities. We performed extensive experiments and show that our approach outperforms the leading existing methods in clustering, classification, and anomaly detection on dozens of real datasets. We further demonstrate that the representation allows rotation-invariant matching in shape datasets. [ABSTRACT FROM AUTHOR]
Published: 2012
Full Text: View/download PDF

17. Semantic storyboard of judicial debates: a novel multimedia summarization environment.

Author: Fersini, E. and Sartori, F.
Subjects: *DATA mining, *INFORMATION retrieval research, *JUSTICE administration websites, *DIGITAL video, *ONTOLOGIES (Information retrieval)
Abstract: Purpose – The need of tools for content analysis, information extraction and retrieval of multimedia objects in their native form is strongly emphasized into the judicial domain: digital videos represent a fundamental informative source of events occurring during judicial proceedings that should be stored, organized and retrieved in short time and with low cost. This paper seeks to address these issues. Design/methodology/approach – In this context the JUMAS system, stem from the homonymous European Project (www.jumasproject.eu), takes up the challenge of exploiting semantics and machine learning techniques towards a better usability of multimedia judicial folders. Findings – In this paper one of the most challenging issues addressed by the JUMAS project is described: extracting meaningful abstracts of given judicial debates in order to efficiently access salient contents. In particular, the authors present an ontology enhanced multimedia summarization environment able to derive a synthetic representation of judicial media contents by a limited loss of meaningful information while overcoming the information overload problem. Originality/value – The adoption of ontology-based query expansion has made it possible to improve the performance of multimedia summarization algorithms with respect to the traditional approaches based on statistics. The effectiveness of the proposed approach has been evaluated on real media contents, highlighting a good potential for extracting key events in the challenging area of judicial proceedings. [ABSTRACT FROM AUTHOR]
Published: 2012
Full Text: View/download PDF

18. Challenges and Opportunities in Mining Neuroscience Data.

Author: Akil, Huda, Martone, Maryann E., and Van Essen, David C.
Subjects: *NEUROSCIENCES, *DATA mining, *DATA, *INFORMATION retrieval research, *BRAIN research, *NEUROINFORMATICS, *NEURAL transmission
Abstract: Understanding the brain requires a broad range of approaches and methods from the domains of biology, psychology, chemistry, physics, and mathematics. The fundamental challenge is to decipher the "neural choreography" associated with complex behaviors and functions, including thoughts, memories, actions, and emotions. This demands the acquisition and integration of vast amounts of data of many types, at multiple scales in time and in space. Here we discuss the need for neuroinformatics approaches to accelerate progress, using several illustrative examples. The nascent field of "connectomics" aims to comprehensively describe neuronal connectivity at either a macroscopic level (in long-distance pathways for the entire brain) or a microscopic level (among axons, dendrites, and synapses in a small brain region). The Neuroscience Information Framework (NIF) encompasses all of neuroscience and facilitates the integration of existing knowledge and databases of many types. These examples illustrate the opportunities and challenges of data mining across multiple tiers of neuroscience information and underscore the need for cultural and infrastructure changes if neuroinformatics is to fulfill its potential to advance our understanding of the brain. [ABSTRACT FROM AUTHOR]
Published: 2011
Full Text: View/download PDF

19. A Similarity Measure for Indefinite Rankings.

Author: WEBBER, WILLIAM, MOFFAT, ALISTAIR, and ZOBEL, JUSTIN
Subjects: *INFORMATION resource research, *INTERNET searching, *INFORMATION storage & retrieval systems, *DATA mining, *SEARCH engines, *INFORMATION retrieval research, *WEB search engines, *ELECTRONIC information resource searching
Abstract: Ranked lists are encountered in research and daily life and it is often of interest to compare these lists even when they are incomplete or have only some members in common. An example is document rankings returned for the same query by different search engines. A measure of the similarity between incomplete rankings should handle nonconjointness, weight high ranks more heavily than low, and be monotonic with increasing depth of evaluation; but no measure satisfying all these criteria currently exists. In this article, we propose a new measure having these qualities, namely rank-biased overlap (RBO). The RBO measure is based on a simple probabilistic user model. It provides monotonicity by calculating, at a given depth of evaluation, a base score that is non-decreasing with additional evaluation, and a maximum score that is nonincreasing. An extrapolated score can be calculated between these bounds if a point estimate is required. RBO has a parameter which determines the strength of the weighting to top ranks. We extend RBO to handle tied ranks and rankings of different lengths. Finally, we give examples of the use of the measure in comparing the results produced by public search engines and in assessing retrieval systems in the laboratory. [ABSTRACT FROM AUTHOR]
Published: 2010
Full Text: View/download PDF

20. Focused retrieval and result aggregation with political data.

Author: Kaptein, Rianne and Marx, Maarten
Subjects: *CASE studies, *TRANSCRIPTION (Linguistics), *SEARCH engines, *INFORMATION retrieval research, *MEETINGS, *DATA mining, *ANNOTATIONS
Abstract: This paper presents a case-study in which we use a large semi-structured data set consisting of official transcripts of meetings of the Dutch parliament for focused retrieval and result aggregation. Transcripts of meetings are a document genre characterized by a complex narrative structure. The essence is not only what is said, but also by who and to whom. We have notes of more than 40 years of Dutch parliamentary debates where this structure is exploited to automatically make semantic annotations. These annotations yield numerous new ways of searching, browsing, mining and summarizing these documents. Concerning result aggregation, we summarise and visualise the structure of meetings into tables of content and interruption graphs. The contents of meetings or parts of meetings are condensed into word clouds that are created using a parsimonious language model. Furthermore, we have developed a search engine that exploits the structure and annotations of our data making it possible to provide entry points, to group search results, and to use faceted search techniques for data-exploration. Evaluation shows that our content and structure summarization tools provide a good first impression of a debate. Users reported that, compared to a standard document retrieval system, our search engine gives a better overview of the data. Search tasks are performed faster and the users felt more certain of their answers. [ABSTRACT FROM AUTHOR]
Published: 2010
Full Text: View/download PDF

21. INFORMATION EXTRACTION TECHNIQUES FOR HEALTH, SAFETY AND ENVIRONMENT APPLICATIONS IN OIL INDUSTRY.

Author: Sanchez-Pi, Nayat, Orosa, Luis Marti, and Bicharra Garcia, Ana Cristina
Subjects: DATA mining, INFORMATION retrieval research, ONTOLOGIES (Information retrieval), AUTOMATIC extracting (Information science), INDUSTRIAL hygiene research
Abstract: The accident investigation process in oil industry is a critical activity. In accident investigation, the volume of information collected and analyzed, makes analysis of its causes a challenging task. Making sense of its large volume of information is also a challenging task because of the diverse interpretations made by experts and auditors. With the advances of new technologies, data of these systems have become increasingly huge. The main objective is to propose and evaluate information extraction techniques in occupational health control process, particularly, for automatic detection of accidents from unstructured texts. Our proposal divides the problem in sub tasks such as text analysis, recognition and classification of failed occupational health control, resolving accidents. [ABSTRACT FROM AUTHOR]
Published: 2013

22. Selection Search on Meta Search Engine.

Author: Patel, Biraj and Shah, Dipti
Subjects: SEARCH engines, ELECTRONIC information resource searching, ENGINEERING databases, INFORMATION storage & retrieval systems, DATA mining, INFORMATION retrieval research
Abstract: Existing meta search engine uses identical approach in terms of use of search engines for retrieval of link. They basically send request for links on fixed number of search engines and retrieve results and display aggregate result on screen by eliminating duplicates. In existing meta search engines there is no availability of user choice facility. There is a need to have selection dynamically. This paper discusses the need of dynamic selection of search engine in a meta search engine and discusses a model, which allows user to select search engines. Then, the meta search engine will send request to selected search engines and will generate and display aggregate result. [ABSTRACT FROM AUTHOR]
Published: 2013

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

22 results on '"INFORMATION retrieval research"'

1. Building subject-aligned comparable corpora and mining it for truly parallel sentence pairs.

2. DeepMeSH: deep semantic representation for improving large-scale MeSH indexing.

3. Modality-Dependent Cross-Media Retrieval.

4. An automated approach to quantifying functional interactions by mining large-scale product specification data.

5. Latent entity space: a novel retrieval approach for entity-bearing queries.

6. An overview on the exploitation of time in collaborative filtering.

7. Effective Filtering of Query Results on Updated User Behavioral Profiles in Web Mining.

8. Combining hybrid rule ordering strategies based on netconf and a novel satisfaction mechanism for CAR-based classifiers.

9. EXPLAINING DATA-DRIVEN DOCUMENT CLASSIFICATIONS.

10. LEVERAGING PHILANTHROPIC BEHAVIOR FOR CUSTOMER SUPPORT: THE CASE OF USER SUPPORT FORUMS.

11. Searching Dimension Incomplete Databases.

12. A hybrid approach to Arabic named entity recognition.

13. A Critical Review of K Means Text Clustering Algorithms.

14. A Survey on Preprocessing in Text Mining.

15. A longitudinal study of HotMap web search.

16. Rotation-invariant similarity in time series using bag-of-patterns representation.

17. Semantic storyboard of judicial debates: a novel multimedia summarization environment.

18. Challenges and Opportunities in Mining Neuroscience Data.

19. A Similarity Measure for Indefinite Rankings.

20. Focused retrieval and result aggregation with political data.

21. INFORMATION EXTRACTION TECHNIQUES FOR HEALTH, SAFETY AND ENVIRONMENT APPLICATIONS IN OIL INDUSTRY.

22. Selection Search on Meta Search Engine.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

22 results on '"INFORMATION retrieval research"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources