1. Alice and the Caterpillar: A more descriptive null model for assessing data mining results.
- Author
-
Preti, Giulia, De Francisci Morales, Gianmarco, and Riondato, Matteo
- Subjects
DATA mining ,STATISTICAL hypothesis testing ,BIPARTITE graphs ,CATERPILLARS ,MARKOV chain Monte Carlo ,BINARY sequences - Abstract
We introduce novel null models for assessing the results obtained from observed binary transactional and sequence datasets, using statistical hypothesis testing. Our null models maintain more properties of the observed dataset than existing ones. Specifically, they preserve the Bipartite Joint Degree Matrix of the bipartite (multi-)graph corresponding to the dataset, which ensures that the number of caterpillars, i.e., paths of length three, is preserved, in addition to other properties considered by other models. We describe Alice, a suite of Markov chain Monte Carlo algorithms for sampling datasets from our null models, based on a carefully defined set of states and efficient operations to move between them. The results of our experimental evaluation show that Alice mixes fast and scales well, and that our null model finds different significant results than ones previously considered in the literature. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF