Back to Search Start Over

A Probabilistic Similarity Metric for Medline Records: A Model for Author Name Disambiguation.

Authors :
Torvik, Vetle I.
Weeber, Marc
Swanson, Don R.
Smalheiser, Neil R.
Source :
Journal of the American Society for Information Science & Technology. Jan2005, Vol. 56 Issue 2, p140-158. 19p.
Publication Year :
2005

Abstract

In this article, researchers present a model for estimating the probability that a pair of author names appearing on two different articles refer to the same individual. Bio-informatics research databases have dramatically accelerated the pace of discovery in the biomedical sciences. Among these, Medline is the oldest and the best curated, and arguably it contains the most scientific information insofar as it summarizes knowledge that has been published across all biomedical fields. Medline and the most popular search interface, PubMed, have devoted a great deal of attention to the comprehensive retrieval of papers according to their subject content. Thus, each paper in Medline is indexed by hierarchical controlled-vocabulary medical subject headings, and this information is used automatically in query processing by PubMed. The model uses a simple yet powerful similarity profile between a pair of articles, based on title, journal name, coauthor names, medical subject headings, language, affiliation, and name attributes. The similarity profile distribution is computed from reference sets consisting of pairs of articles containing almost exclusively author matches versus non-matches, generated in an unbiased manner. Although the match set is generated automatically and might contain a small proportion of non-matches, the model is quite robust against contamination with non-matches.

Details

Language :
English
ISSN :
15322882
Volume :
56
Issue :
2
Database :
Academic Search Index
Journal :
Journal of the American Society for Information Science & Technology
Publication Type :
Academic Journal
Accession number :
15632993
Full Text :
https://doi.org/10.1002/asi.20105