Back to Search Start Over

Temporal Analog Retrieval using Transformation over Dual Hierarchical Structures

Authors :
Katsumi Tanaka
Yating Zhang
Adam Jatowt
Source :
CIKM
Publication Year :
2017
Publisher :
ACM, 2017.

Abstract

In recent years, we have witnessed a rapid increase of text con- tent stored in digital archives such as newspaper archives or web archives. Many old documents have been converted to digital form and made accessible online. Due to the passage of time, it is however difficult to effectively perform search within such collections. Users, especially younger ones, may have problems in finding appropriate keywords to perform effective search due to the terminology gap arising between their knowledge and the unfamiliar domain of archival collections. In this paper, we provide a general framework to bridge different domains across-time and, by this, to facilitate search and comparison as if carried in user's familiar domain (i.e., the present). In particular, we propose to find analogical terms across temporal text collections by applying a series of transformation procedures. We develop a cluster-biased transformation technique which makes use of hierarchical cluster structures built on the temporally distributed document collections. Our methods do not need any specially prepared training data and can be applied to diverse collections and time periods. We test the performance of the proposed approaches on the collections separated by both short (e.g., 20 years) and long time gaps (70 years), and we report improvements in range of 18%-27% over short and 56%-92% over long periods when compared to state-of-the-art baselines.

Details

Database :
OpenAIRE
Journal :
Proceedings of the 2017 ACM on Conference on Information and Knowledge Management
Accession number :
edsair.doi...........01ad7abce92b73d811927702f829ef9d