Back to Search Start Over

Learning to Find Relevant Biological Articles without Negative Training Examples

Authors :
Keith Noto
Charles Elkan
Milton H. Saier
Source :
AI 2008: Advances in Artificial Intelligence ISBN: 9783540893776, Australasian Conference on Artificial Intelligence
Publication Year :
2008
Publisher :
Springer Berlin Heidelberg, 2008.

Abstract

Classifiers are traditionally learned using sets of positive and negative training examples. However, often a classifier is required, but for training only an incomplete set of positive examples and a set of unlabeled examples are available. This is the situation, for example, with the Transport Classification Database (TCDB, www.tcdb.org), a repository of information about proteins involved in transmembrane transport. This paper presents and evaluates a method for learning to rank the likely relevance to TCDB of newly published scientific articles, using the articles currently referenced in TCDB as positive training examples. The new method has succeeded in identifying 964 new articles relevant to TCDB in fewer than six months, which is a major practical success. From a general data mining perspective, the contributions of this paper are (i) evaluating two novel approaches that solve the positive-only problem effectively, (ii) applying support vector machines in a state-of-the-art way for recognizing and ranking relevance, and (iii) deploying a system to update a widely-used, real-world biomedical database. Supplementary information including all data sets are publicly available at www.cs.ucsd.edu/users/knoto/pub/ajcai08.

Details

ISBN :
978-3-540-89377-6
ISBNs :
9783540893776
Database :
OpenAIRE
Journal :
AI 2008: Advances in Artificial Intelligence ISBN: 9783540893776, Australasian Conference on Artificial Intelligence
Accession number :
edsair.doi...........93674c6a7d60b2c3db5cc7693472244a
Full Text :
https://doi.org/10.1007/978-3-540-89378-3_20