1. Data analysis with empirical probability functions as a data mining method: Employing CF-miner and pattern difference quantifiers
- Author
-
Milan Simunek, Krzysztof Urbaniec, Ivan Nagy, Milan Sliacky, and Jindrich Borka
- Subjects
Set (abstract data type) ,Computer science ,Histogram ,Ticket ,Data mining ,Type (model theory) ,computer.software_genre ,Empirical probability ,Equivalence (measure theory) ,computer ,Electronic mail ,Small set - Abstract
In this paper we perceive data analysis with empirical probability functions as a data mining method. We propose a way to carry out this type of analysis by employing the LISp-Miner system, namely the CF-Miner procedure and pattern difference quantifiers. In order to confirm that LISp-Miner is a suitable tool for this purpose, we briefly present both methods and then show their equivalence. We do this by providing theoretical description which we then support by analysing a small set of data concerning traffic accidents with methods and comparing results. Afterwards we provide an example of analysis of a full data set concerning rail tickets sold at selected stations in 2014. We show that by considering “difference histograms” it is possible to identify remarkable dissimilarities in histograms of time of ticket sale that would not be found otherwise. Both analyses confirms that the method we propose can provide new and interesting results even if the data has been already analysed.
- Published
- 2018
- Full Text
- View/download PDF