Back to Search
Start Over
Gibbs Sampling Subjectively Interesting Tiles
- Source :
- Lecture Notes in Computer Science ISBN: 9783030445836, IDA, ADVANCES IN INTELLIGENT DATA ANALYSIS XVIII, IDA 2020, Advances in Intelligent Data Analysis XVIII, Advances in Intelligent Data Analysis {XVIII}-18th International Symposium on Intelligent Data Analysis (IDA 2020), Advances in Intelligent Data Analysis-18th International Symposium on Intelligent Data Analysis (IDA 2020), Apr 2020, Konstanz (on line), Germany. ⟨10.1007/978-3-030-44584-3_7⟩
- Publication Year :
- 2020
- Publisher :
- Springer International Publishing, 2020.
-
Abstract
- International audience; The local pattern mining literature has long struggled with the so-called pattern explosion problem: the size of the set of patterns found exceeds the size of the original data. This causes computational problems (enumerating a large set of patterns will inevitably take a substantial amount of time) as well as problems for interpretation and usabil-ity (trawling through a large set of patterns is often impractical). Two complementary research lines aim to address this problem. The first aims to develop better measures of interestingness, in order to reduce the number of uninteresting patterns that are returned [6, 10]. The second aims to avoid an exhaustive enumeration of all 'interesting' patterns (where interestingness is quantified in a more traditional way, e.g. frequency), by directly sampling from this set in a way that more 'interest-ing' patterns are sampled with higher probability [2]. Unfortunately, the first research line does not reduce computational cost, while the second may miss out on the most interesting patterns. In this paper, we combine the best of both worlds for mining interesting tiles [8] from binary databases. Specifically, we propose a new pattern sampling approach based on Gibbs sampling, where the probability of sampling a pattern is proportional to their subjective interest-ingness [6]-an interestingness measure reported to better represent true interestingness. The experimental evaluation confirms the theory, but also reveals an important weakness of the proposed approach which we speculate is shared with any other pattern sampling approach. We thus conclude with a broader discussion of this issue, and a forward look.
- Subjects :
- Technology and Engineering
Computer science
02 engineering and technology
KNOWLEDGE DISCOVERY
computer.software_genre
[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
Interpretation (model theory)
Local pattern
Set (abstract data type)
symbols.namesake
[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]
Gibbs sampling
Pattern mining
020204 information systems
0202 electrical engineering, electronic engineering, information engineering
Pattern sampling
[INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB]
Subjective interestingness
business.industry
Trawling
Usability
Mathematics and Statistics
Large set (Ramsey theory)
symbols
020201 artificial intelligence & image processing
Data mining
Computational problem
business
computer
Subjects
Details
- ISBN :
- 978-3-030-44583-6
- ISSN :
- 03029743 and 16113349
- ISBNs :
- 9783030445836
- Database :
- OpenAIRE
- Journal :
- Lecture Notes in Computer Science ISBN: 9783030445836, IDA, ADVANCES IN INTELLIGENT DATA ANALYSIS XVIII, IDA 2020, Advances in Intelligent Data Analysis XVIII, Advances in Intelligent Data Analysis {XVIII}-18th International Symposium on Intelligent Data Analysis (IDA 2020), Advances in Intelligent Data Analysis-18th International Symposium on Intelligent Data Analysis (IDA 2020), Apr 2020, Konstanz (on line), Germany. ⟨10.1007/978-3-030-44584-3_7⟩
- Accession number :
- edsair.doi.dedup.....bfc1aa8c759f6a96fcf128e8025184a5