Back to Search Start Over

Pattern Discovery and Detection: A Unified Statistical Methodology.

Authors :
Hand, David J.
Bolton, Richard J.
Source :
Journal of Applied Statistics. Oct2004, Vol. 31 Issue 8, p885-924. 40p.
Publication Year :
2004

Abstract

Modern statistical data analysis is predominantly model-driven, seeking to decompose an observed data distribution in terms of major underlying descriptive features modified by some stochastic variation. A large part of data mining is also concerned with this exercise. However, another fundamental part of data mining is concerned with detecting anomalies amongst the vast mass of the data: the small deviations, unusual observations, unexpected clusters of observations, or surprising blips in the data, which the model does not explain. We call such anomalies patterns. For sound reasons, which are outlined in the paper, the data mining community has tended to focus on the algorithmic aspects of pattern discovery, and has not developed any general underlying theoretical base. However, such a base is important for any technology: it helps to steer the direction in which the technology develops, as well as serving to provide a basis from which algorithms can be compared, and to indicate which problems are the important ones waiting to be solved. This paper attempts to provide such a theoretical base, linking the ideas to statistical work in spatial epidemiology, scan statistics, outlier detection, and other areas. One of the striking characteristics of work on pattern discovery is that the ideas have been developed in several theoretical arenas, and also in several application domains, with little apparent awareness of the fundamentally common nature of the problem. Like model building, pattern discovery is fundamentally an inferential activity, and is an area in which statisticians can make very significant contributions. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
02664763
Volume :
31
Issue :
8
Database :
Academic Search Index
Journal :
Journal of Applied Statistics
Publication Type :
Academic Journal
Accession number :
15269093
Full Text :
https://doi.org/10.1080/0266476042000270518