Back to Search Start Over

Toward Computer-Assisted Text Curation: Classification Is Easy (Choosing Training Data Can Be Hard...).

Authors :
Denroche, Robert
Madupu, Ramana
Yooseph, Shibu
Sutton, Granger
Shatkay, Hagit
Source :
Linking Literature, Information & Knowledge for Biology; 2010, p33-42, 10p
Publication Year :
2010

Abstract

We aim to design a system for classifying scientific articles based on the presence of protein characterization experiments, intending to aid the curators populating JCVI΄s Characterized Protein (CHAR) Database of experimentally characterized proteins. We trained two classifiers using small datasets labeled by CHAR curators, and another classifier based on a much larger dataset using annotations from public databases. Performance varied greatly, in ways we did not anticipate. We describe the datasets, the classification method, and discuss the unexpected results. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISBNs :
9783642131301
Database :
Complementary Index
Journal :
Linking Literature, Information & Knowledge for Biology
Publication Type :
Book
Accession number :
76848329
Full Text :
https://doi.org/10.1007/978-3-642-13131-8_5