Back to Search
Start Over
Topic Ranger : a tool for topic exploration and analysis of spatio-temporal documents
- Publication Year :
- 2019
- Publisher :
- Nanyang Technological University, 2019.
-
Abstract
- With the wide-spread usage of social media such as Facebook and Twitter, large amount of data with both spatial and temporal information has become available. Topic modelling has been a useful tool to uncover latent information from such data. This thesis considers a specific type of topic model computational problem called topic-range queries, where the topic model of interest is restricted to the data records that fall within a dynamically specified geographic region and time period. To achieve this purpose, one naive approach is to directly apply a range query to retrieve the data items falling within the specified spatio-temporal range, then derive the topic model from the retrieved data by using a known algorithm such as LDA (Latent Dirichlet Allocation). When dealing with large volume of data, however, the two-step naive approach could each incur substantial amount of time. Novel algorithms for expediting the topic-range queries have been designed, including the fast topic combining algorithm FSS (Fast Set Sampling) which indexes the dataset with a tree, and pre-compute the topic model of the subset of data associated with each node of the tree. To answer a topic-range query, the tree nodes covered by the range query are identified, and the pre-computed topic models associated with these tree nodes are merged to produce an approximate result. Compared to the nave approach, this approximation of topic model substantially can reduce runtime. In the original design of the FSS algorithm, Cube trees are used as the indexing structure to support spatio-temporal range queries. In the literature, however, Range Trees offer a better worst-case query time guarantee for a range query. This master thesis thus considers a new combination of Range Trees and FSS (called Topic Ranger) to support the topic-range queries. The thesis presents the design, implementation of several versions of Topic Ranger for trade-offs between execution time and memory space. It also documents the experiments and comparisons of the execution time and the quality of the resulting approximate topic models against that of the original FSS scheme. Master of Engineering
- Subjects :
- Computer science
Computer science and engineering::Data [Engineering]
Subjects
Details
- Database :
- OpenAIRE
- Accession number :
- edsair.doi.dedup.....3bcea76439ad009999b36eace38a8d44
- Full Text :
- https://doi.org/10.32657/10220/49683