Back to Search Start Over

An Automatic Unsupervised Querying Algorithm for Efficient Information Extraction in Biomedical Domain.

Authors :
Ho, Tu Bao
Cheung, David
Liu, Huan
Song, Min
Song, Il-Yeol
Hu, Xiaohua
Allen, Robert B.
Source :
Advances in Knowledge Discovery & Data Mining (9783540260769); 2005, p173-179, 7p
Publication Year :
2005

Abstract

In the domain of bioinformatics, extracting a relation such as protein-protein interations from a large database of text documents is a challenging task. One major issue with biomedical information extraction is how to efficiently digest the sheer size of unstructured biomedical data corpus. Often, among these huge biomedical data, only a small fraction of the documents contain information that is relevant to the extraction task. We propose a novel query expansion algorithm to automatically discover the characteristics of documents that are useful for extraction of a target relation. Our technique introduces a hybrid query re-weighting algorithm combining the modified Robertson Sparck-Jones query ranking algorithm with a keyphrase extraction algorithm. Our technique also adopts a novel query translation technique that incorporates POS categories to query translation. We conduct a series of experiments and report the experimental results. The results show that our technique is able to retrieve more documents that contain protein-protein pairs from MEDLINE as iteration increases. Our technique is also compared with SLIPPER, a supervised rule-based query expansion technique. The results show that our technique outperforms SLIPPER from 17.90% to 29.98 better in four iterations. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISBNs :
9783540260769
Database :
Supplemental Index
Journal :
Advances in Knowledge Discovery & Data Mining (9783540260769)
Publication Type :
Book
Accession number :
32883114
Full Text :
https://doi.org/10.1007/11430919_22