Back to Search Start Over

Gibbs Sampling Subjectively Interesting Tiles

Authors :
Tijl De Bie
Jefrey Lijffijt
Anes Bendimerad
Céline Robardet
Marc Plantevit
Berthold, MR
Feelders, A
Krempl, G
Data Mining and Machine Learning (DM2L)
Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS)
Institut National des Sciences Appliquées de Lyon (INSA Lyon)
Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL)
Université de Lyon-École Centrale de Lyon (ECL)
Université de Lyon-Université Lumière - Lyon 2 (UL2)-Institut National des Sciences Appliquées de Lyon (INSA Lyon)
Université de Lyon-Université Lumière - Lyon 2 (UL2)
Internet Technology and Data Science Lab (IDLab)
Universiteit Antwerpen [Antwerpen]-Universiteit Gent = Ghent University [Belgium] (UGENT)
Source :
Lecture Notes in Computer Science ISBN: 9783030445836, IDA, ADVANCES IN INTELLIGENT DATA ANALYSIS XVIII, IDA 2020, Advances in Intelligent Data Analysis XVIII, Advances in Intelligent Data Analysis {XVIII}-18th International Symposium on Intelligent Data Analysis (IDA 2020), Advances in Intelligent Data Analysis-18th International Symposium on Intelligent Data Analysis (IDA 2020), Apr 2020, Konstanz (on line), Germany. ⟨10.1007/978-3-030-44584-3_7⟩
Publication Year :
2020
Publisher :
Springer International Publishing, 2020.

Abstract

International audience; The local pattern mining literature has long struggled with the so-called pattern explosion problem: the size of the set of patterns found exceeds the size of the original data. This causes computational problems (enumerating a large set of patterns will inevitably take a substantial amount of time) as well as problems for interpretation and usabil-ity (trawling through a large set of patterns is often impractical). Two complementary research lines aim to address this problem. The first aims to develop better measures of interestingness, in order to reduce the number of uninteresting patterns that are returned [6, 10]. The second aims to avoid an exhaustive enumeration of all 'interesting' patterns (where interestingness is quantified in a more traditional way, e.g. frequency), by directly sampling from this set in a way that more 'interest-ing' patterns are sampled with higher probability [2]. Unfortunately, the first research line does not reduce computational cost, while the second may miss out on the most interesting patterns. In this paper, we combine the best of both worlds for mining interesting tiles [8] from binary databases. Specifically, we propose a new pattern sampling approach based on Gibbs sampling, where the probability of sampling a pattern is proportional to their subjective interest-ingness [6]-an interestingness measure reported to better represent true interestingness. The experimental evaluation confirms the theory, but also reveals an important weakness of the proposed approach which we speculate is shared with any other pattern sampling approach. We thus conclude with a broader discussion of this issue, and a forward look.

Details

ISBN :
978-3-030-44583-6
ISSN :
03029743 and 16113349
ISBNs :
9783030445836
Database :
OpenAIRE
Journal :
Lecture Notes in Computer Science ISBN: 9783030445836, IDA, ADVANCES IN INTELLIGENT DATA ANALYSIS XVIII, IDA 2020, Advances in Intelligent Data Analysis XVIII, Advances in Intelligent Data Analysis {XVIII}-18th International Symposium on Intelligent Data Analysis (IDA 2020), Advances in Intelligent Data Analysis-18th International Symposium on Intelligent Data Analysis (IDA 2020), Apr 2020, Konstanz (on line), Germany. ⟨10.1007/978-3-030-44584-3_7⟩
Accession number :
edsair.doi.dedup.....bfc1aa8c759f6a96fcf128e8025184a5