Author: "Snowsill, Tristan" / Publication Type: Dissertations - Searchworks@Jio Institute Digital Library Search Results

1. Data mining in text streams using suffix trees

Author: Snowsill, Tristan
Subjects: 006.312
Abstract: Data mining in text streams, or text stream mining, is an increasingly im- portant topic for a number of reasons, including the recent explosion in the availability of textual data and an increasing need for people and organi- sations to process and understand as much of that information as possible, from single users to multinational corporations and governments. In this thesis we present a data structure based on a generalised suffix tree which is capable of solving a number of text stream mining tasks. It can be used to detect changes in the text stream, detect when chunks of text are reused and detect events through identifying when the frequencies of phrases change in a statistically significant way. Suffix trees have been used for many years in the areas of combinatorial pattern matching and computational genomics. In this thesis we demonstrate how the suffix tree can become more widely applicable by making it possible to use suffix trees to analyse streams of data rather than static data sets, opening up a number of future avenues for research. The algorithms which we present are designed to be efficient in an on-line setting by having time complexity independent of the total amount of text seen and polynomial in the rate at which text is seen. We demonstrate the effectiveness of our methods on a large text stream comprising thousands of documents every day. This text stream is the stream of text news coming from over 600 online news outlets and the results ob- tained are of interest to news consumers, journalists and social scientists.
Published: 2012

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

1 results on '"Snowsill, Tristan"'

1. Data mining in text streams using suffix trees

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Publication Year Range

Language

Publication Type

Database

1 results on '"Snowsill, Tristan"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources