Back to Search
Start Over
Fast Frequent Patterns Mining by Multiple Sampling With Tight Guarantee Under Bayesian Statistics
- Source :
- IEEE Transactions on Cybernetics. 53:2993-3006
- Publication Year :
- 2023
- Publisher :
- Institute of Electrical and Electronics Engineers (IEEE), 2023.
-
Abstract
- Sampling from large dataset is commonly used in the frequent patterns (FPs) mining. To tightly and theoretically guarantee the quality of the FPs obtained from samples, current methods theoretically stabilize the supports of all the patterns in random samples, despite only FPs do matter, so they always overestimate the sample size. We propose an algorithm called multiple sampling-based FPs mining (MSFP). The MSFP first generates the set of approximate frequent items (AFI), and uses the AFI to form the set of approximate FPs without supports ( AFP*), where it does not stabilize the value of any item's or pattern's support, but only stabilizes the relationship ≥ or < between the support and the minimum support, so the MSFP can use small samples to successively obtain the AFI and AFP*, and can successively prune the patterns not contained by the AFI and not in the AFP*. Then, the MSFP introduces the Bayesian statistics to only stabilize the values of supports of AFP*'s patterns. If a pattern's support in the original dataset is unknown, the MSFP regards it as random, and keeps updating its distribution by its approximations obtained from the samples taken in the progressive sampling, so the error probability can be bound better. Furthermore, to reduce the I/O processes in the progressive sampling, the MSFP stores a large enough random sample in memory in advance. The experiments show that the MSFP is reliable and efficient.
- Subjects :
- Computer science
Sampling (statistics)
Value (computer science)
Computer Science Applications
Human-Computer Interaction
Bayesian statistics
Set (abstract data type)
Distribution (mathematics)
Control and Systems Engineering
Sample size determination
Probability of error
Multiple sampling
Electrical and Electronic Engineering
Algorithm
Software
Information Systems
Subjects
Details
- ISSN :
- 21682275 and 21682267
- Volume :
- 53
- Database :
- OpenAIRE
- Journal :
- IEEE Transactions on Cybernetics
- Accession number :
- edsair.doi.dedup.....b47ac0ea7508e7a345949293d39f683c