Back to Search
Start Over
SG-WSTD: A framework for scalable geographic web search topic discovery.
- Source :
-
Knowledge-Based Systems . Aug2015, Vol. 84, p18-33. 16p. - Publication Year :
- 2015
-
Abstract
- Search engine query logs are recognized as an important information source that contains millions of users’ web search needs. Discovering Geographic Web Search Topics (G-WSTs) from a query log can support a variety of downstream web applications such as finding commonality between locations and profiling search engine users. However, the task of discovering G-WSTs is nontrivial, not only because of the diversity of the information in web search but also due to the sheer size of query log. In this paper, we propose a new framework, Scalable Geographic Web Search Topic Discovery (SG-WSTD), which contains highly scalable functionalities such as search session derivation, geographic information extraction and geographic web search topic discovery to discover G-WSTs from query log. Within SG-WSTD, two probabilistic topic models are proposed to discover G-WSTs from two complementary perspectives. The first one is the Discrete Search Topic Model (DSTM), which discovers G-WSTs that capture the commonalities between discrete locations. The second is the Regional Search Topic Model (RSTM), which focuses on a specific geographic region on the map and discovers G-WSTs that demonstrate geographic locality. Since query log is typically voluminous, we implement the functionalities in SG-WSTD based on the MapReduce paradigm to solve the efficiency bottleneck. We evaluate SG-WSTD against several strong baselines on a real-life query log from AOL. The proposed framework demonstrates significantly improved data interpretability, better prediction performance, higher topic distinctiveness and superior scalability in the experimentation. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 09507051
- Volume :
- 84
- Database :
- Academic Search Index
- Journal :
- Knowledge-Based Systems
- Publication Type :
- Academic Journal
- Accession number :
- 102658704
- Full Text :
- https://doi.org/10.1016/j.knosys.2015.03.020