552 results on '"GSP Algorithm"'
Search Results
2. Analysis of Sequential Book Loan Data Pattern Using Generalized Sequential Pattern (GSP) Algorithm
- Author
-
Lisdya Anggraini and Tri Astuti
- Subjects
GSP Algorithm ,Apriori algorithm ,Information retrieval ,Association rule learning ,Loan ,Computer science ,Electronic computers. Computer science ,Code (cryptography) ,Information system ,QA75.5-76.95 ,data mining, association rules, apriori algorithm, minimal support, confidence ,Transaction data ,Database transaction - Abstract
As a center for learning and information services, STMIK Amikom Purwokerto Library is a source of learning and a source of intellectual activity that is very important for the entire academic community in supporting the achievement of the college Tridharma program. Book lending transaction data, can produce information that is important as supporting decision making when further analyzed. One useful information is that it can provide information in the form of user behavior patterns in borrowing books that are used to maintain the availability of related book stocks to be balanced. This study uses the Generalized Sequential Pattern (GSP) algorithm, which can be used to determine the behavior patterns of users in each transaction and can show relationships or associations between books, both requested simultaneously and sequentially. From the calculations that have been done, 295 frequent sequences are consisting of 3 sequence patterns that are formed from the minimum support of 0.53% or the minimum number of books borrowed, namely 2 books. Three book items have very strong linkages in book lending transactions, namely book code 6690, 2026, and 8131.
- Published
- 2019
3. Scalable and parallel sequential pattern mining using spark
- Author
-
Xiao Yu, Jin Liu, and Qing Li
- Subjects
Sequence database ,Computer Networks and Communications ,Computer science ,02 engineering and technology ,Parallel computing ,Partition (database) ,GSP Algorithm ,Hardware and Architecture ,020204 information systems ,Spark (mathematics) ,Scalability ,0202 electrical engineering, electronic engineering, information engineering ,Overhead (computing) ,020201 artificial intelligence & image processing ,Sequential Pattern Mining ,Software - Abstract
The performance of the existing parallel sequential pattern mining algorithms is often unsatisfactory due to high IO overhead and imbalanced load among the computing nodes. To address such problems, this paper proposes two efficient parallel sequential pattern mining algorithms based on Spark, i.e., GSP-S (GSP algorithm based on Spark) and PrefixSpan-S (PrefixSpan algorithm based on Spark). For both algorithms, multiple MapReduce jobs are implemented to complete a mining task. To reduce IO overhead and take advantage of cluster memory, the first MapReduce job loads sequence database from the Hadoop Distributed File System (HDFS) into the Spark resilient distributed datasets (RDDs), and further MapReduce jobs read the database from the RDDs and store intermediate results back into the RDDs. Our findings suggest that a wise choice can be made between GSP-S and PrefixSpan-S, depending on the user-specified minimum support threshold. Moreover, theoretical analysis shows that GSP-S and PrefixSpan-S are sensitive to data distribution on the cluster. To further improve performance, we propose two database partition strategies to balance load among the computing nodes in a cluster. Experiment results demonstrate the high performance of GSP-S and PrefixSpan-S in terms of load-balancing, speedup and scalability.
- Published
- 2018
4. Mining frequent trajectory pattern based on vague space partition.
- Author
-
Wang, Liang, Hu, Kunyuan, Ku, Tao, and Yan, Xiaohui
- Subjects
- *
TRAJECTORY measurements , *DATA mining , *PROBLEM solving , *COMPUTER algorithms , *APPROXIMATION theory , *MEMBERSHIP - Abstract
Abstract: Frequent trajectory pattern mining is an important spatiotemporal data mining problem with broad applications. However, it is also a difficult problem due to the approximate nature of spatial trajectory locations. Most of the previously developed frequent trajectory pattern mining methods explore a crisp space partition approach [8,10] to alleviate the spatial approximation concern. However, this approach may cause the sharp boundary problem that spatially close trajectory locations may fall into different partitioned regions, and eventually result in failure of finding meaningful trajectory patterns. In this paper, we propose a flexible vague space partition approach to solve the sharp boundary problem. In this approach, the spatial plane is divided into a set of vague grid cells, and trajectory locations are transformed into neighboring vague grid cells by a distance-based membership function. Based on two classical sequential mining algorithms, the PrefixSpan and GSP algorithms, we propose two efficient trajectory pattern mining algorithms, called VTPM-PrefixSpan and VTPM-GSP, to mine the transformed trajectory sequences with time interval constraints. A comprehensive performance study on both synthetic and real datasets shows that the VTPM-PrefixSpan algorithm outperforms the VTPM-GSP algorithm in both effectiveness and scalability. [Copyright &y& Elsevier]
- Published
- 2013
- Full Text
- View/download PDF
5. Detection of Crimes Using Unsupervised Learning Techniques
- Author
-
G. Snehal, R. Bulli Babu, and P. Aditya Satya Kiran
- Subjects
Multidisciplinary ,Fuzzy clustering ,business.industry ,Computer science ,Partition problem ,k-means clustering ,Pattern recognition ,computer.software_genre ,Set (abstract data type) ,GSP Algorithm ,Data stream clustering ,CURE data clustering algorithm ,Expectation–maximization algorithm ,Canopy clustering algorithm ,Unsupervised learning ,Affinity propagation ,General Materials Science ,Artificial intelligence ,Data mining ,business ,Cluster analysis ,computer ,FSA-Red Algorithm - Abstract
Objectives: The main objective of this paper is to solve the criminal problems with in less amount of time. There are many methods to do so but this paper concentrates in solve the easily and reduce the time in solving the case. Methods: To solve the criminal cases with in less time there are many methods but here we used clustering technique. Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). When a case is enrolled into the data base before if there is any case similar to it then we can solve the case easily by doing the same procedure. Findings: Before they used to file a case on FIR. But now a day, they are using data bases to file a case. By getting any new case they are comparing the new case with the older case so that it will be easy to find the suspect as it takes less time to solve the case. Before they used for other techniques like classification etc. But in my findings and research work clustering is simple, more accurate and takes less time to solve the case easily. In clustering techniques also we have different type of algorithm, but in this paper we are using the k-means algorithm and expectation - maximization algorithm. We are using these techniques because these two techniques come under the partition algorithm. Partition algorithm is one of the best method to solve crimes and to find the similar data and group it. K-means algorithm is done by partitioning data into groups based on their means. K-means algorithm has an extension called expectation- maximization algorithm here we partition the data based on their parameters. Applications: This system can be used for the Indian crime departments for reducing the crime and solving the crimes with less time. This technique can be used to solve the crimes with in less time.
- Published
- 2017
6. Discovery of Study Patterns that Impacts Students’ Discussion Performance in Forum Assignments
- Author
-
Seiji Isotani, Bruno Elias Penteado, Paula Maria Pereira Paiva, Deborah Viviane Ferrari, and Marina Morettin-Zupelari
- Subjects
Computer science ,Learning analytics ,030206 dentistry ,02 engineering and technology ,Course (navigation) ,Active participation ,GSP Algorithm ,03 medical and health sciences ,0302 clinical medicine ,Work (electrical) ,Active learning ,ComputingMilieux_COMPUTERSANDEDUCATION ,0202 electrical engineering, electronic engineering, information engineering ,Mathematics education ,020201 artificial intelligence & image processing ,Clinical case ,Set (psychology) - Abstract
Student-centered courses rely on the active participation of the students in forum assignments. In this work, we investigate a course where the forum assignment discusses a clinical case among professional students (N = 94). We propose a method to discover navigation patterns related to performance grades, using behavioral actions in an LMS platform. We selected a set of significant course actions and built per-user sequences along the course module. Then, we applied the GSP algorithm to identify ordered patterns from this navigational data. The identified patterns were then used as features for a linear regression model, to predict the assignments’ performance, graded manually by the teachers, and controlling for factors that may influence it. Results show some rules correlated to the students’ performances. These results can be used to better inform course designers on how to improve the courseware and instructors on how to better guide their students.
- Published
- 2019
7. Constraint-Based Measures for DNA Sequence Mining using Group Search Optimization Algorithm
- Author
-
Neelu Khare and Kuruva Lakshmanna
- Subjects
General Computer Science ,Optimization algorithm ,Group (mathematics) ,Computer science ,General Engineering ,0102 computer and information sciences ,02 engineering and technology ,01 natural sciences ,DNA sequencing ,Constraint (information theory) ,GSP Algorithm ,010201 computation theory & mathematics ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Algorithm ,FSA-Red Algorithm - Published
- 2016
8. Probabilistic Static Load-Balancing of Parallel Mining of Frequent Sequences
- Author
-
Robert Kessl
- Subjects
Analysis of parallel algorithms ,Computer science ,Data stream mining ,Probabilistic logic ,Parallel algorithm ,Affinity analysis ,02 engineering and technology ,computer.software_genre ,Graph ,Computer Science Applications ,Randomized algorithm ,GSP Algorithm ,Computational Theory and Mathematics ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Probabilistic analysis of algorithms ,Data mining ,computer ,Information Systems - Abstract
Frequent sequence mining is well known and well studied problem in datamining. The output of the algorithm is used in many other areas like bioinformatics, chemistry, and market basket analysis. Unfortunately, the frequent sequence mining is computationally quite expensive. In this paper, we present a novel parallel algorithm for mining of frequent sequences based on a static load-balancing. The static load-balancing is done by measuring the computational time using a probabilistic algorithm. For reasonable size of instance, the algorithms achieve speedups up to $\approx 3/4\cdot P$ where $P$ is the number of processors. In the experimental evaluation, we show that our method performs significantly better then the current state-of-the-art methods. The presented approach is very universal: it can be used for static load-balancing of other pattern mining algorithms such as itemset/tree/graph mining algorithms.
- Published
- 2016
9. An Efficient Parallel Association Rules Mining Algorithm for Fault Diagnosis
- Author
-
Kai Ran Zhang, Jing Liu, Shi Yan Fan, Hai Peng Ji, Tai Yong Wang, and Zhi Peng Wang
- Subjects
Apriori algorithm ,Association rule learning ,business.industry ,Computer science ,Mechanical Engineering ,InformationSystems_DATABASEMANAGEMENT ,Machine learning ,computer.software_genre ,Fault (power engineering) ,GSP Algorithm ,Very large database ,ComputingMethodologies_PATTERNRECOGNITION ,Mechanics of Materials ,A priori and a posteriori ,General Materials Science ,Artificial intelligence ,Pruning (decision trees) ,Data mining ,business ,computer ,FSA-Red Algorithm - Abstract
With the development of Internet industry, equipment data is increasing. The traditional method is not suitable for processing large data. Aiming at inefficient problem of Apriori algorithm when mining very large database, an efficient parallel association rules mining algorithm (Advanced Pruning Parallel Apriori Algorithm) based on a cluster is presented. APPAA algorithm can enhance the mining efficiency, as well as the system’s extension. Experimental results show that APPAA algorithm cuts down 85% mining time of Apriori, and it has good characteristics of parallel and expandable.so it is suitable for mining very large size database of fault diagnosis.
- Published
- 2016
10. An Enhanced Frequent Pattern Growth Based on MapReduce for Mining Association Rules
- Author
-
Yahya E. A. Al-Salhi, Arkan A. G Al-Hamodi, and Songfeng Lu
- Subjects
Association rule learning ,Computer science ,media_common.quotation_subject ,InformationSystems_DATABASEMANAGEMENT ,02 engineering and technology ,computer.software_genre ,Execution time ,GSP Algorithm ,020204 information systems ,Map reduce ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Quality (business) ,Data mining ,Database transaction ,computer ,media_common - Abstract
In mining frequent itemsets, one of most important algorithm is FP-growth. FP-growth proposes an algorithm to compress information needed for mining frequent itemsets in FP-tree and recursively constructs FP-trees to find all frequent itemsets. In this paper, we propose the EFP-growth (enhanced FPgrowth) algorithm to achieve the quality of FP-growth. Our proposed method implemented the EFPGrowth based on MapReduce framework using Hadoop approach. New method has high achieving performance compared with the basic FP-Growth. The EFP-growth it can work with the large datasets to discovery frequent patterns in a transaction database. Based on our method, the execution time under different minimum supports is decreased..
- Published
- 2016
11. Analysis and Implementation some of Data Mining Algorithms by Collecting Algorithm based on Simple Association Rules
- Author
-
Nadia Moqbel Hassan
- Subjects
Apriori algorithm ,Weighted Majority Algorithm ,Association rule learning ,business.industry ,Computer science ,Population-based incremental learning ,Machine learning ,computer.software_genre ,GSP Algorithm ,Set (abstract data type) ,Probabilistic analysis of algorithms ,Artificial intelligence ,Data mining ,business ,Algorithm ,computer ,FSA-Red Algorithm - Abstract
analysis is utilized to detect the learning and set up tenets from a huge dataset. The minimum support value in the association investigation is a discriminating element to influence the execution of this detection. Association rule mining represent to a data mining method and its objective is to discover intriguing association or correlation relationships among a huge set of data elements. In this paper new algorithm has been proposed which to collecting the (Sample Association Rules) taken from (Basic Apriori Algorithm) with the (Multiple Minimum Support utilizing Maximum Constraints Algorithms). The algorithm is executed, and is compared with its other algorithms, using a new proposed comparison algorithm. Comparisons have been on various groups of data. Consequences of applying the proposed algorithm indicate speedier implementation than different algorithms. At the end, both of execution and results shows: Effortlessness, exactness, and velocity to new algorithm, as well as reliability of the another algorithms.
- Published
- 2016
12. Fast algorithm for high utility pattern mining with the sum of item quantities
- Author
-
Keun Ho Ryu, Heungmo Ryang, and Unil Yun
- Subjects
Computer science ,Binary number ,02 engineering and technology ,Construct (python library) ,computer.software_genre ,Fast algorithm ,Field (computer science) ,Theoretical Computer Science ,GSP Algorithm ,Tree (data structure) ,Task (computing) ,Tree structure ,Artificial Intelligence ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Data mining ,computer - Abstract
In frequent pattern mining, items are considered as having the same importance in a database and their occurrence are represented as binary values in transactions. In real-world databases, however, items not only have relative importance but also are represented as non-binary values in transactions. High utility pattern mining is one of the most essential issues in the pattern mining field, which recently emerged to address the limitation of frequent pattern mining. Meanwhile, tree construction with a single database scan is significant since a database scan is a time-consuming task. In utility mining, an additional database scan is necessary to identify actual high utility patterns from candidates. In this paper, we propose a novel tree structure, namely SIQ- Tree (Sum of Item Quantities), which captures database information through a single-pass. Moreover, a restructuring method is suggested with strategies for reducing overestimated utilities. The proposed algorithm can construct the SIQ-Tree with only a single scan and decrease the number of candidate patterns effectively with the reduced overestimation utilities, through which mining performance is improved. Experimental results show that our algorithm outperforms a state-of-the-art one in terms of runtime and the number of generated candidates with a similar memory usage.
- Published
- 2016
13. Multi-objective association rule mining with binary bat algorithm
- Author
-
Anping Song, Wei Cao, Jianjiao Chen, Ke Pu, Xuehai Ding, and Mingbo Li
- Subjects
0209 industrial biotechnology ,Apriori algorithm ,Weighted Majority Algorithm ,Association rule learning ,Computer science ,Population-based incremental learning ,02 engineering and technology ,computer.software_genre ,Theoretical Computer Science ,GSP Algorithm ,020901 industrial engineering & automation ,Artificial Intelligence ,Genetic algorithm ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Data mining ,computer ,Bat algorithm ,FSA-Red Algorithm - Abstract
Association rule mining meeting a variety of measures is regarded as a multi-objective optimization problem rather than a single objective optimization problem. The convergent speed of traditional multi-objective algorithms such as genetic algorithm is slow and the efficiency of these algorithms is low. Furthermore, the rules generated by traditional multi-objective algorithms are too large to be efficiently analyzed and explored in any further process. Bat algorithm is a new efficient global optimal algorithm whose convergence is superior to binary particle swarm optimization (BPSO) and genetic algorithm. This paper discusses the application of multi-objective bat algorithm to association rule mining. We propose multi-objective binary bat algorithm (MBBA) based on Pareto for association rule mining. This algorithm is independent of minimum support and minimum confidence. To evaluate the association rules mined by MBBA algorithm, we propose a new method to discover interesting association rules without favoring or excluding any measure. Compared with the single-objective BPSO, binary bat algorithm (BBA) and Apriori algorithm, the experimental results on six datasets show that the new algorithm is feasible and highly effective. It can make up the shortage of single objective algorithms and traditional association rule mining algorithms.
- Published
- 2016
14. AN EFFICIENT DATA MINING METHOD TO FIND FREQUENT ITEM SETS IN LARGE DATABASE USING TR- FCTM
- Author
-
Saravanan Suba and T. Christopher
- Subjects
Minimum Support ,Apriori algorithm ,lcsh:Computer engineering. Computer hardware ,Association rule learning ,Database ,Computer science ,05 social sciences ,lcsh:TK7885-7895 ,Apriori ,02 engineering and technology ,computer.software_genre ,GSP Algorithm ,Reduction (complexity) ,Set (abstract data type) ,FP-Tree ,020204 information systems ,0502 economics and business ,0202 electrical engineering, electronic engineering, information engineering ,Table (database) ,TR-FCTM ,050211 marketing ,Data mining ,computer ,Database transaction ,FSA-Red Algorithm - Abstract
Mining association rules in large database is one of most popular data mining techniques for business decision makers. Discovering frequent item set is the core process in association rule mining. Numerous algorithms are available in the literature to find frequent patterns. Apriori and FP-tree are the most common methods for finding frequent items. Apriori finds significant frequent items using candidate generation with more number of data base scans. FP-tree uses two database scans to find significant frequent items without using candidate generation. This proposed TR-FCTM (Transaction Reduction- Frequency Count Table Method) discovers significant frequent items by generating full candidates once to form frequency count table with one database scan. Experimental results of TR-FCTM shows that this algorithm outperforms than Apriori and FP-tree.
- Published
- 2016
15. Identification of Human Behavior Patterns Based on the GSP Algorithm
- Author
-
Jorge Benítez Hurtado, T. Edwin Fabricio Lozada, Freddy Giancarlo Salazar Carrillo, Joselito Naranjo-Santamaria, Hector F. Gomez A, Richard Eduardo Ruiz Ordoñez, Luis Antonio Llerena, and Teodoro Alvarado Barros
- Subjects
GSP Algorithm ,Identification (information) ,Sequence ,Computer science ,business.industry ,Pattern recognition ,Artificial intelligence ,Artificial vision system ,business - Abstract
The analysis of the algorithms dedicated to the identification of sequential patterns described in the literature, shows that not all are suitable for the type of scenarios with which video surveillance often deals, in particular for the recognition of behavior patterns suspects to classify human behavior as normal or suspicious, it is necessary to analyze all the monitored actions. This is the reason why in this study the main proposal is a modification of the Generalized Sequential Patterns, which we call Generalized Sequential Patterns+memory, which mainly incorporates a module that manages the number of repetitions and combinations of actions (and not only of the sequence) that make up patterns. For the experimentation scenes of theft in supermarkets have been recorded with labels representing states that we assume can be recognized by an artificial vision system. The results obtained were analyzed and their performance was evaluated by comparing them with the results obtained from the GSP application.
- Published
- 2018
16. An overview of data structures and algorithms: case study of us in the vector-space model and mining off requentitem sets using the apriori algorithm
- Author
-
D.L. Nkweteyim
- Subjects
Apriori algorithm ,Computer science ,Property (programming) ,05 social sciences ,InformationSystems_DATABASEMANAGEMENT ,010501 environmental sciences ,computer.software_genre ,Data structure ,01 natural sciences ,GSP Algorithm ,ComputingMethodologies_PATTERNRECOGNITION ,data structures, algorithms, vector-space model, frequent itemsets mining, apriori algorithm ,0502 economics and business ,Vector space model ,A priori and a posteriori ,Data mining ,050207 economics ,Row ,computer ,0105 earth and related environmental sciences ,FSA-Red Algorithm - Abstract
In this paper, we review some commonly used data structures and algorithms. We then review two important problems: the creation of the vector-space model that is widely used in the design of information retrieval systems, and the mining of frequent itemsets using the apriori algorithm. We consider two variations of the apriori algorithm: the first is the classical algorithm which computes candidate k-itemsets by first joining frequent (k-1)-itemsets to themselves, and applying the apriori property to prune the generated candidate k-itemsets; the second avoids the join stage in the classical algorithm, and instead, generates candidate k-itemsets directly from rows of the transactions database, followed by application of the apriori property to prune each itemset so determined. Finally, we illustrate appropriate data structures and algorithms that when put together, provide efficient implementations of our solution to the problems mentioned.Keywords: data structures, algorithms, vector-space model, frequent itemsets mining, apriori algorithm
- Published
- 2018
17. A Refined K-Means Technique to Find the Frequent Item Sets
- Author
-
Nagaraju Devarakonda, A. Sarvani, and B. Venugopal
- Subjects
Euclidean distance ,Set (abstract data type) ,GSP Algorithm ,Computer science ,Group (mathematics) ,Cluster (physics) ,Minkowski distance ,k-means clustering ,Cluster analysis ,Algorithm - Abstract
In this paper we have shown the behaviour of the new k-means algorithm. In k-means clustering first we take the ‘n’ number of item sets, then we group those item sets into the k clusters by placing the item set in the cluster with nearest mean. The traditional k-means clustering is completely depend on initial clusters and can be used only on spherical-shape clusters. The traditional k-means clustering uses the euclidean distance but in our paper we have replaced it with minkowski distance and combined with the Generalized Sequential Pattern algorithm (GSP algorithm) to find the frequent item sets in the sequential data stream. The GSP algorithm based on the frequent item sets, it traces the databases iteratively. The modified k-means clustering have reduce the complexity and calculations and the GSP algorithm has given the better result than any other algorithm to find the frequent item sets. The results show that this approach has given the better performance when compared to the traditional k means clustering.
- Published
- 2017
18. A distributed frequent itemset mining algorithm using Spark for Big Data analytics
- Author
-
Weiming Shen, Min Liu, Feng Zhang, Yunlong Ma, Feng Gui, and Abdallah Shami
- Subjects
Association rule learning ,Computer Networks and Communications ,Computer science ,business.industry ,Big data ,InformationSystems_DATABASEMANAGEMENT ,computer.software_genre ,GSP Algorithm ,Spark (mathematics) ,Scalability ,Benchmark (computing) ,Pruning (decision trees) ,Data mining ,business ,computer ,Software ,FSA-Red Algorithm - Abstract
Frequent itemset mining is an essential step in the process of association rule mining. Conventional approaches for mining frequent itemsets in big data era encounter significant challenges when computing power and memory space are limited. This paper proposes an efficient distributed frequent itemset mining algorithm (DFIMA) which can significantly reduce the amount of candidate itemsets by applying a matrix-based pruning approach. The proposed algorithm has been implemented using Spark to further improve the efficiency of iterative computation. Numeric experiment results using standard benchmark datasets by comparing the proposed algorithm with the existing algorithm, parallel FP-growth, show that DFIMA has better efficiency and scalability. In addition, a case study has been carried out to validate the feasibility of DFIMA.
- Published
- 2015
19. Frequent Item Sets and Association Rules Mining Algorithm Based on Floyd Algorithm
- Author
-
Zhang Jianli and Zhang Lin
- Subjects
Apriori algorithm ,Association rule learning ,Computer science ,General Chemistry ,Floyd–Warshall algorithm ,Condensed Matter Physics ,computer.software_genre ,Data mining algorithm ,GSP Algorithm ,Computational Mathematics ,General Materials Science ,Data mining ,Electrical and Electronic Engineering ,computer ,FSA-Red Algorithm - Published
- 2015
20. Differentially Private Frequent Itemset Mining via Transaction Splitting
- Author
-
Sen Su, Shengzhi Xu, Fangchun Yang, Xiang Cheng, and Zhengyi Li
- Subjects
Decision support system ,Information privacy ,Data stream mining ,Computer science ,InformationSystems_DATABASEMANAGEMENT ,020207 software engineering ,02 engineering and technology ,computer.software_genre ,Computer Science Applications ,Set (abstract data type) ,GSP Algorithm ,ComputingMethodologies_PATTERNRECOGNITION ,Web mining ,Computational Theory and Mathematics ,Web browsing history ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Differential privacy ,Algorithm design ,Data mining ,Database transaction ,computer ,Information Systems - Abstract
Recently, there has been a growing interest in designing differentially private data mining algorithms. Frequent itemset mining (FIM) is one of the most fundamental problems in data mining. In this paper, we explore the possibility of designing a differentially private FIM algorithm which can not only achieve high data utility and a high degree of privacy, but also offer high time efficiency. To this end, we propose a differentially private FIM algorithm based on the FP-growth algorithm, which is referred to as PFP-growth. The PFP-growth algorithm consists of a preprocessing phase and a mining phase. In the preprocessing phase, to improve the utility and privacy tradeoff, a novel smart splitting method is proposed to transform the database. For a given database, the preprocessing phase needs to be performed only once. In the mining phase, to offset the information loss caused by transaction splitting, we devise a run-time estimation method to estimate the actual support of itemsets in the original database. In addition, by leveraging the downward closure property, we put forward a dynamic reduction method to dynamically reduce the amount of noise added to guarantee privacy during the mining process. Through formal privacy analysis, we show that our PFP-growth algorithm is $\epsilon$ -differentially private. Extensive experiments on real datasets illustrate that our PFP-growth algorithm substantially outperforms the state-of-the-art techniques.
- Published
- 2015
21. Closing the Gap
- Author
-
Rainer Gemulla, Iris Miliaraki, Klaus Berberich, and Kaustubh Beedkar
- Subjects
Correctness ,business.industry ,Computer science ,Data stream mining ,Concept mining ,computer.software_genre ,Partition (database) ,GSP Algorithm ,Text mining ,Scalability ,Data mining ,Sequential Pattern Mining ,business ,computer ,Information Systems - Abstract
Frequent sequence mining is one of the fundamental building blocks in data mining. While the problem has been extensively studied, few of the available techniques are sufficiently scalable to handle datasets with billions of sequences; such large-scale datasets arise, for instance, in text mining and session analysis. In this article, we propose MG-FSM, a scalable algorithm for frequent sequence mining on MapReduce. MG-FSM can handle so-called “gap constraints”, which can be used to limit the output to a controlled set of frequent sequences. Both positional and temporal gap constraints, as well as appropriate maximality and closedness constraints, are supported. At its heart, MG-FSM partitions the input database in a way that allows us to mine each partition independently using any existing frequent sequence mining algorithm. We introduce the notion of ω-equivalency, which is a generalization of the notion of a “projected database” used by many frequent pattern mining algorithms. We also present a number of optimization techniques that minimize partition size, and therefore computational and communication costs, while still maintaining correctness. Our experimental study in the contexts of text mining and session analysis suggests that MG-FSM is significantly more efficient and scalable than alternative approaches.
- Published
- 2015
22. The Novel Approach based on Improving Apriori Algorithm and Frequent Pattern Algorithm for Mining Association Rule
- Author
-
R B S Yadav and Mohammad Shahnawaz Nasir
- Subjects
Apriori algorithm ,Association rule learning ,Computer science ,InformationSystems_DATABASEMANAGEMENT ,computer.software_genre ,GSP Algorithm ,Set (abstract data type) ,ComputingMethodologies_PATTERNRECOGNITION ,Knowledge extraction ,Pruning (decision trees) ,Data mining ,K-optimal pattern discovery ,Algorithm ,computer ,FSA-Red Algorithm - Abstract
The effectiveness of mining association rules is a significant field of Knowledge Discovery in Databases (KDD). The Apriori algorithm is a classical algorithm in mining association rules. This paper presents an improved method for Apriori and Frequent Pattern algorithms to increase the efficiency of generating association rules. This algorithm adopts a new method to decrease the redundant generation of sub-itemsets during pruning the candidate itemsets, which can form directly the set of frequent itemset and remove candidates having a subset that is not frequent in the meantime. This algorithm can raise the probability of obtaining information in scanning database and reduce the potential scale of itemsets
- Published
- 2015
23. Extracting Frequent Sequences from Web Log Data using Sequence Tree Algorithm
- Author
-
Jayashree Jha, Dipshree Dhage, Priyanka Baraskar, Supriya Chavan, and Samruddhi Giri
- Subjects
Web analytics ,medicine.medical_specialty ,Computer science ,business.industry ,GSP Algorithm ,World Wide Web ,Web mining ,medicine ,Web navigation ,Web mapping ,Web intelligence ,business ,Algorithm ,Web modeling ,Data Web - Abstract
5 Abstract: Sequential Pattern mining is the process of applying data mining techniques to large web data repositories.With the extensive use of Internet, discovery and analysis of useful information from the World Wide Web becomes a practical necessity. Data mining techniques are applied to a sequential database to discover the correlation relationships that exists among the ordered list of events. In this kind of mining, hidden data is extracted to get useful information which helps in knowing the browsing patterns of the users. Web usage mining is a data mining method that can be used in recommending the web usage patterns with the help of users' session and behaviour. The aim of discovering frequent sequential patterns in Web log data is to obtain information about the access behaviour of the users. It helps to understand the buying pattern of the existing customers. This paper focuses on the performance of the sequence tree algorithm which is better than the Generalized Sequential Pattern (GSP) algorithm. This paper emphasizes on the running time of sequence tree algorithm and its ability to discover more number of patterns than the standard GSP algorithm. With the advancement of the Information Technology, usage of World Wide Web is increasing day by day, which is becoming today's necessity. Innumerable visitors interact daily with the web throughout the world. Different kinds of data have to be organized in a way that they can be accessed by many users effectively. Web Mining is the application of data mining technologies which is being used to extract huge Web data repositories. Web mining can be broadly classified into three major parts: Web Contents Mining, Web Usage Mining and Web structure mining.
- Published
- 2015
24. Mining frequent closed inter-sequence patterns efficiently using dynamic bit vectors
- Author
-
Bac Le, Bay Vo, and Minh-Thai Tran
- Subjects
GSP Algorithm ,Scheme (programming language) ,Sequence ,Artificial Intelligence ,Data stream mining ,Computer science ,Trie ,Data mining ,Space (commercial competition) ,computer.software_genre ,computer ,Database transaction ,computer.programming_language - Abstract
Mining frequent sequences is a critical stage before rule generation for sequence databases. Currently, there are two main ways for mining frequent sequences, namely intra-sequence mining and inter-sequence mining. Inter-sequence mining is more attractive than intra-sequence mining because it considers the relationship between sequences in transactions. However, mining all possible frequent inter-sequences takes a long time and requires a lot of memory. Mining frequent closed inter-sequences is efficient because such sequences are compact, and only the necessary information is maintained. CISP-Miner was proposed for mining frequent closed inter-sequence patterns, but it consumes a lot of memory since many closed patterns are mined. This paper proposes an algorithm called ClosedISP for mining frequent closed inter-sequence patterns. The proposed algorithm uses a checking scheme for early eliminating and checking closed patterns without candidate maintenance. ClosedISP uses a dynamic bit vector that combines transaction information to compress data. In addition, ClosedISP adopts a prefix tree and a depth-first search order to reduce the search space and generate non-redundant sequential rules efficiently. Experiments were conducted to compare the proposed algorithm with CISP-Miner to demonstrate the effectiveness of the proposed algorithm in terms of runtime and memory usage.
- Published
- 2015
25. An efficient and effective algorithm for mining top-rank-k frequent patterns
- Author
-
Bac Le, Quyen Huynh-Thi-Le, Tuong Le, and Bay Vo
- Subjects
Structure (mathematical logic) ,Speedup ,Computer science ,Rank (computer programming) ,General Engineering ,Process (computing) ,computer.software_genre ,Computer Science Applications ,Effective algorithm ,GSP Algorithm ,Ranking ,Artificial Intelligence ,Data mining ,computer - Abstract
Using N-list structure for mining top-rank-k frequent patterns effectively.Subsume concept was also used to speed up the runtime of the mining process.The experiment was conducted to show the effectiveness of the proposed algorithm. Frequent pattern mining generates a lot of candidates, which requires a lot of memory usage and mining time. In real applications, a small number of frequent patterns are used. Therefore, the mining of top-rank-k frequent patterns, which limits the number of mined frequent patterns by ranking them in frequency, has received increasing interest. This paper proposes the iNTK algorithm, which is an improved version of the NTK algorithm, for mining top-rank-k frequent patterns. This algorithm employs an N-list structure to represent patterns. The subsume concept is used to speed up the process of mining top-rank-k patterns. The experiments are conducted to evaluate iNTK and NTK in terms of mining time and memory usage for eight datasets. The experimental results show that iNTK is more efficient and faster than NTK.
- Published
- 2015
26. High Speed Database Sequence Comparison
- Author
-
Bassel Soudan and Talal Bonny
- Subjects
Sequence ,sequence comparison ,Database ,Sequence analysis ,Computer science ,Needleman-Wunsch ,Needleman–Wunsch algorithm ,computer.software_genre ,GSP Algorithm ,Set (abstract data type) ,General Earth and Planetary Sciences ,Data mining ,Sequence Analysis ,computer ,Alignment-free sequence analysis ,Blossom algorithm ,General Environmental Science - Abstract
Database sequence comparison applications compare a query sequence with each sequence in a database to find the closest match. These applications are high consumers of computation time because they use dynamic programming algorithms to perform the large number of required sequence comparisons. Traditional methods perform the comparisons on the entire set of sequences in the database. In this work, we introduce a novel high-speed technique that reduces the number of database sequences to which the time-consuming matching algorithm is applied. The selection of the target database sequences is based on similarity measures that will be introduced in this contribution as well. Using the proposed technique and the proposed similarity measures, we are able to accelerate the database sequence comparison by 65% compared to traditional exhaustive methods.
- Published
- 2015
27. An Efficient Count Based Transaction Reduction Approach for Mining Frequent Patterns
- Author
-
A. Pethalakshmi and V. Vijayalakshmi
- Subjects
Apriori algorithm ,Association Rule ,Association rule learning ,Computer science ,Transaction Reduction Technique ,InformationSystems_DATABASEMANAGEMENT ,Apriori ,Support Count ,computer.software_genre ,Frequent Item Set ,GSP Algorithm ,Reduction (complexity) ,General Earth and Planetary Sciences ,Data mining ,Database transaction ,computer ,General Environmental Science ,FSA-Red Algorithm - Abstract
Apriori algorithm is a classical algorithm of association rule mining and widely used for generating frequent item sets. This classical algorithm is inefficient due to so many scans of database. And if the database is large, it will take too much time to scan the database. To overcome these limitations, researchers have made a lot of improvements to the Apriori. This paper analyses the classical algorithm as well as some disadvantages of the improved Apriori and also proposed two new transaction reduction techniques for mining frequent patterns in large databases. In this approach, the whole database is scanned only once and the data is compressed in the form of a Bit Array Matrix. The frequent patterns are then mined directly from this Matrix. It also adopts a new count-based transaction reduction and support count method for candidates. Appropriate operations are designed and performed on matrices to achieve efficiency. All the algorithms are executed in 5% to 25% support level and the results are compared. Efficiency is proved through performance analysis.
- Published
- 2015
- Full Text
- View/download PDF
28. Research of Improved FP-Growth Algorithm in Association Rules Mining
- Author
-
Shiqun Yin, Jiangyue Liu, Yi Zeng, and Miao Zhang
- Subjects
Weighted Majority Algorithm ,Article Subject ,Dinic's algorithm ,Computer science ,Population-based incremental learning ,computer.software_genre ,Hybrid algorithm ,Computer Science Applications ,GSP Algorithm ,QA76.75-76.765 ,Algorithmics ,In-place algorithm ,Computer software ,Data mining ,Algorithm ,computer ,Software ,FSA-Red Algorithm - Abstract
Association rules mining is an important technology in data mining. FP-Growth (frequent-pattern growth) algorithm is a classical algorithm in association rules mining. But the FP-Growth algorithm in mining needs two times to scan database, which reduces the efficiency of algorithm. Through the study of association rules mining and FP-Growth algorithm, we worked out improved algorithms of FP-Growth algorithm—Painting-Growth algorithm and N (not) Painting-Growth algorithm (removes the painting steps, and uses another way to achieve). We compared two kinds of improved algorithms with FP-Growth algorithm. Experimental results show that Painting-Growth algorithm is more than 1050 and N Painting-Growth algorithm is less than 10000 in data volume; the performance of the two kinds of improved algorithms is better than that of FP-Growth algorithm.
- Published
- 2015
29. Mining high utility itemsets for transaction deletion in a dynamic database
- Author
-
Guo-Cheng Lan, Tzung-Pei Hong, and Chun-Wei Lin
- Subjects
Computer science ,InformationSystems_DATABASEMANAGEMENT ,Downward closure property ,Binary number ,computer.software_genre ,Theoretical Computer Science ,GSP Algorithm ,Artificial Intelligence ,Dynamic database ,Dummy variable ,Batch processing ,Computer Vision and Pattern Recognition ,Data mining ,computer ,Database transaction ,Utility mining - Abstract
Association-rule mining is used to mine the relationships among the occurrences itemsets in a transactional database. An item is treated as a binary variable whose value is one if it appears in a transaction and zero otherwise. In real-world appli- cations, several products may be purchased at the same time, with each product having an associated profit, quantity, and price. Association-rule mining from a binary database is thus not sufficient in some applications. Utility mining was thus proposed as an extension of frequent-itemset mining for considering various factors from the user. Most utility mining approaches can only process static databases and use batch processing. In real-world applications, transactions are dynamically inserted into or deleted from databases. The Fast UPdated (FUP) algorithm and the FUP2 algorithm were respectively proposed to handle trans- action insertion and deletion in dynamic databases. In this paper, a fast-updated high-utility itemsets for transaction deletion (FUP-HUI-DEL) algorithm is proposed to handle transaction deletion for efficiently updating discovered high utility itemsets in decremental mining. The two-phase approach in high utility mining is applied to the proposed FUP-HUI-DEL algorithm for preserving the downward closure property to reduce the number of candidates. The FUP2 algorithm for handling transaction deletion in association-rule mining is adopted in the proposed FUP-HUI-DEL algorithm to reduce the number of scans of the original database in high utility mining. Experiments show that the proposed FUP-HUI-DEL algorithm outperforms the batch two-phase approach.
- Published
- 2015
30. The Research of Generation Algorithm of Frequent Itemsets in High-Dimensional Data
- Author
-
Qing Chao Jiang
- Subjects
Clustering high-dimensional data ,Apriori algorithm ,Association rule learning ,Computer science ,InformationSystems_DATABASEMANAGEMENT ,General Medicine ,computer.software_genre ,GSP Algorithm ,Key (cryptography) ,Logical matrix ,Data mining ,computer ,Algorithm ,FSA-Red Algorithm - Abstract
In the mining of association rules, the generation of frequent itemsets is a key factor that influence the efficiency and performance of the algorithm. With the increase of data dimension, it is obvious that the traditional association rules mining algorithm can’t meet the demand of high dimensional data mining. On the basis of Apriori algorithm, we put forward Split Mtrix _Apriori algorithm in this paper. By generating the Boolean matrix of the database, Split Mtrix _Apriori algorithm decreased the times of scanning database when generating the frequent itemsets. With adopting grouping processing strategy in the Boolean matrix, the algorithm can still keep high efficiency in dealing with high-dimensional data.So Split Mtrix _Apriori improved the efficiency of association rule mining significantly.
- Published
- 2015
31. A Novel and Improved Apriori Algorithm
- Author
-
Dong Juan Gu and Lei Xia
- Subjects
GSP Algorithm ,Apriori algorithm ,Matrix (mathematics) ,Association rule learning ,Computer science ,InformationSystems_DATABASEMANAGEMENT ,General Medicine ,Data mining ,computer.software_genre ,computer ,Database transaction ,FSA-Red Algorithm - Abstract
Apriori algorithm is the classical algorithm in data mining association rules. Because the Apriori algorithm needs scan database for many times, it runs too slowly. In order to improve the running efficiency, this paper improves the Apriori algorithm based on the Apriori analysis. The improved idea is that it transforms the transaction database into corresponding 0-1 matrix. Whose each vector and subsequent vector does inner product operation to receive support. And comparing with the given minsupport, the rows and columns will be deleted if vector are less than the minsupport, so as to reduce the size of the rating matrix, improve the running speeding. Because the improved algorithm only needs to scan the database once when running, therefore the running speeding is more quickly. The experiment also shows that this improved algorithm is efficient and feasible.
- Published
- 2014
32. WITHDRAWN: Biological Sequence Pattern Mining Algorithm Based on Data Index Technology
- Author
-
Jiadong Ren and Weina Li
- Subjects
0301 basic medicine ,Index (economics) ,Computer science ,Health Informatics ,02 engineering and technology ,computer.software_genre ,Data mining algorithm ,Sequence pattern ,GSP Algorithm ,03 medical and health sciences ,030104 developmental biology ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Data mining ,computer - Published
- 2017
33. Sequential Mining Classification
- Author
-
Carine Bou Rjeily, Georges Badr, Emmanuel Andrès, and Amir Hajjam El Hassani
- Subjects
Apriori algorithm ,Computer science ,business.industry ,Data stream mining ,Concept mining ,02 engineering and technology ,computer.software_genre ,Machine learning ,GSP Algorithm ,Tree (data structure) ,Statistical classification ,ComputingMethodologies_PATTERNRECOGNITION ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Data mining ,Artificial intelligence ,business ,K-optimal pattern discovery ,computer ,FSA-Red Algorithm - Abstract
Sequential pattern mining is a data mining technique that aims to extract and analyze frequent subsequences from sequences of events or items with time constraint. Sequence data mining was introduced in 1995 with the well-known Apriori algorithm. The algorithm studied the transactions through time, in order to extract frequent patterns from the sequences of products related to a customer. Later, this technique became useful in many applications: DNA researches, medical diagnosis and prevention, telecommunications, etc. GSP, SPAM, SPADE, PrefixSPan and other advanced algorithms followed. View the evolution of data mining techniques based on sequential data, this paper discusses the multiple extensions of Sequential Pattern mining algorithms. We classified the algorithms into Sequential Pattern mining, Sequential rule mining and Sequence prediction with their extensions. The classification is presented in a tree at the end of the paper.
- Published
- 2017
34. An Improved Algorithm for Frequent Itemsets Mining
- Author
-
Hao Jiang and Xu He
- Subjects
GSP Algorithm ,Computer science ,Improved algorithm ,0202 electrical engineering, electronic engineering, information engineering ,020206 networking & telecommunications ,020201 artificial intelligence & image processing ,02 engineering and technology ,Data mining ,Space (commercial competition) ,Data structure ,computer.software_genre ,computer ,FSA-Red Algorithm - Abstract
Based on the classical FP-growth algorithm about frequent itemsets mining, this paper proposes a more efficient non-recursive FPNR-growth algorithm and corresponding data structure. The experimental results show that the FPNR-growth algorithm is superior to the FP-growth algorithm, both in mining time and in storage space.
- Published
- 2017
35. Improvement of ID3 Algorithm Implementation
- Author
-
Ming-hao Liu, Wei Shang, and Nan Liu
- Subjects
Incremental decision tree ,Weighted Majority Algorithm ,Computer science ,business.industry ,Decision tree learning ,Population-based incremental learning ,Decision tree ,ID3 algorithm ,020206 networking & telecommunications ,02 engineering and technology ,Machine learning ,computer.software_genre ,GSP Algorithm ,C4.5 algorithm ,0202 electrical engineering, electronic engineering, information engineering ,Artificial intelligence ,Data mining ,business ,computer ,FSA-Red Algorithm - Abstract
ID3 algorithm is a classical algorithm in decision tree algorithm, commonly used in data mining. Owing to that spatial information data has the characteristics of large capacity and diversity, when ID3 algorithm does data mining for spatial information, in the process of generating decision tree, some nodes will appear special sample sets which the information gain of each classification attribute is 0 and values of result attribute are not unique in. Now, conventional implementation of ID3 algorithm can't ensure the generation of decision tree. Based on this, conventional implementation of ID3 algorithm is improved, and the fault tolerance of algorithm implementation is enhanced. Improved algorithm implementation can do data mining for spatial data set
- Published
- 2017
36. Mining Web Access Sequence with Improved Apriori Algorithm
- Author
-
Haoxiang Huang, Xiaohui Jin, and Jun Yang
- Subjects
Apriori algorithm ,Association rule learning ,Computer science ,InformationSystems_DATABASEMANAGEMENT ,020206 networking & telecommunications ,02 engineering and technology ,computer.software_genre ,GSP Algorithm ,Set (abstract data type) ,ComputingMethodologies_PATTERNRECOGNITION ,Web page ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Algorithm design ,Data mining ,computer ,FSA-Red Algorithm - Abstract
Apriori algorithm is a classic mining algorithm which can mining association rules and sequential patterns. However, when the Apriori algorithm is applied to contiguous sequential pattern mining, it is inefficient. In web log mining, the contiguous sequential pattern can better represent the semantic information of the user's access to the site due to the continuity of the user's visit to the site page. Contiguous sequential pattern can be used not only to predict the user's next access request, but also to improve the site topology structure and set the advertising page. The Apriori algorithm is used to generate a large number of candidates when mining contiguous sequence patterns, and to scan the transaction database frequently. In this paper, we present an improved algorithm that we called AC-Apriori algorithm based on the Apriori algorithm. The AC-Apriori algorithm reduces the times scanning the transaction database while preserving the full mining effect, which reduces the runtime and improves the mining efficiency compared with the Apriori algorithm.
- Published
- 2017
37. Stock Sequence Pattern Mining Method Based on SWI-GSP Algorithm
- Author
-
Huachun Liu and Hua Du
- Subjects
Stock trading ,Association rule learning ,020206 networking & telecommunications ,02 engineering and technology ,Sequence pattern ,Combinatorics ,GSP Algorithm ,Data association ,Sliding window protocol ,0202 electrical engineering, electronic engineering, information engineering ,Sliding time window ,020201 artificial intelligence & image processing ,Algorithm ,Stock (geology) ,Mathematics - Abstract
In the mining of stock data association, investors are more concerned with such association rules as X (t1)→Y (t2), That is, the X shares rise on t1, and the Y shares rise at a certain probability on day t2. As such association rules can't be directly used GSP algorithm, for this reason, the GSP algorithm is depth analyzed and improved, and the SWI-GSP (sliding window interval) algorithm is proposed. In the SWI-GSP algorithm, such association rule: X (t1)→Y (t2) (N= t2-t1) is implemented by adding the time interval parameter N (N= 0,1,2,3··· ···). The sliding time window W is designed, and the time interval N is counted in W, and the sliding window W moves along the time axis of the stock trading to realize the correlation counting in a transaction. Through the Chinese A-share data for the sample experiment, experiments show that, the SWI-GSP algorithm is better mining sequential pattern with time interval than GSP.
- Published
- 2017
38. Research on Mining Cloud Data Based on Correlation Dimension Feature
- Author
-
Jingwen Tu and Li Tao
- Subjects
business.industry ,Data stream mining ,Computer science ,Pattern recognition ,computer.software_genre ,GSP Algorithm ,Data stream clustering ,CURE data clustering algorithm ,Canopy clustering algorithm ,Affinity propagation ,Artificial intelligence ,Data mining ,business ,Cluster analysis ,computer ,Computer Science::Databases ,FSA-Red Algorithm - Abstract
Large hierarchical cloud storage database has distributed of non-continuous massive data, the data has nonlinear characteristics of strong coupling, and using traditional methods for data mining, mining exist difficult problems. This paper proposes mining algorithm based a cloud non continuous layer data, and analyze the overall data mining model. The paper use fuzzy C means clustering algorithm to complete the semantic ontology feature point clustering beam based on semantic feature extraction and quantization encoding, to realize improved data mining algorithm. The experimental results show that the improved algorithm, the non-continuous mining level data have high precision, good performance, anti-interference ability strong, performance is superior to the traditional method.
- Published
- 2017
39. Effective algorithm for frequent pattern mining
- Author
-
C. K. Lakshmikanth, S. P. Aditya, M. Hemanth, and K R Suneetha
- Subjects
020203 distributed computing ,Apriori algorithm ,Association rule learning ,Computer science ,02 engineering and technology ,computer.software_genre ,GSP Algorithm ,Web mining ,020204 information systems ,Web page ,0202 electrical engineering, electronic engineering, information engineering ,Data pre-processing ,Data mining ,computer ,FSA-Red Algorithm - Abstract
Apriori algorithm is an influential data mining algorithm which generates a list of most frequent web pages visited. Due to fast changing contents of database one needs an algorithm which is real time. The major drawback of Apriori algorithm is that, it needs to scan main database each and every time to generate frequent patterns which results in more usage of memory and execution time, hence in order to reduce execution time and usage of memory a lot of research has been conducted to improve Apriori Algorithm. Towards improving Apriori, a modified version is proposed in this paper to generate frequent patterns. This enables finding patterns rather than going back to the database at every pass. This limits the number of scan and also the number of total combinations is brought down from 2N to 2N-2. This considerably reduces memory usage as well as execution speed and makes real time pattern discovery possible.
- Published
- 2017
40. Improvement of Apriori Algorithm Based on Vector and Vertical Array
- Author
-
Zhen-yu Guo and Tian-huang Chen
- Subjects
GSP Algorithm ,Apriori algorithm ,Improved algorithm ,Time efficiency ,Pattern matching ,Data mining ,computer.software_genre ,Time complexity ,Algorithm ,computer ,Vertical array ,FSA-Red Algorithm ,Mathematics - Abstract
In the data mining method of association analysis, the classic Apriori algorithm of discovering frequent item sets may multiple scanning the source database, produce a large number of candidate and repeatedly pattern matching, which leads to low time efficiency of the algorithm. Based on the analysis of the array based algorithm, an improved algorithm is proposed in this paper. The main idea is to scan the source database once and use vector arrays and vertical arrays to represent the transactions, improve the strategy of the join step and the prune step when candidate frequent(k+1)-item sets were generated from frequent(K)-item sets as well as the pattern matching strategy. The experimental results show that the time complexity of the improved algorithm is reduced greatly.
- Published
- 2017
41. Hp-Apriori: Horizontal parallel-apriori algorithm for frequent itemset mining from big data
- Author
-
Mehdi Mansouri and Mohammad-Hossein Nadimi-Shahraki
- Subjects
Apriori algorithm ,Speedup ,Association rule learning ,Data stream mining ,Computer science ,InformationSystems_DATABASEMANAGEMENT ,02 engineering and technology ,computer.software_genre ,GSP Algorithm ,ComputingMethodologies_PATTERNRECOGNITION ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Algorithm design ,Data mining ,computer ,FSA-Red Algorithm - Abstract
Due to large scale and complexity of big data, mining the big data using a single personal computer is a difficult problem. With increasing in the size of databases, parallel computing systems can cause considerable advantages in the data mining applications by means of the exploitation of data mining algorithms. Parallelization of association rule mining algorithms is an important task in data mining to mine frequent patterns from transaction databases. These algorithms either distribute database horizontally or increase number of CPU to reduce execution time of frequent pattern mining. In this paper, a novel frequent itemset mining algorithm, namely Horizontal parallel-Apriori (HP-Apriori), is proposed that divides database both horizontally and vertically with partitioning mining process into four sub-processes so that all four tasks are performed in parallel way. Also the HP-Apriori tries to speed up the mining process by an index file that is generated in the first step of algorithm. The proposed algorithm has been compared with Count Distribution (CD) in terms of execution time and speedup criteria on the four real datasets. Experimental results demonstrated that the HP-Apriori outperforms over CD in terms of minimizing execution time and maximizing speedup in high scalability.
- Published
- 2017
42. The application of matrix Apriori algorithm in web log mining
- Author
-
Hanshi Wang, Hanxiao Zhang, Lizhen Liu, and Wei Song
- Subjects
Apriori algorithm ,Association rule learning ,Computer science ,business.industry ,020209 energy ,Big data ,InformationSystems_DATABASEMANAGEMENT ,02 engineering and technology ,computer.software_genre ,GSP Algorithm ,Web mining ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Algorithm design ,Data mining ,business ,computer ,FSA-Red Algorithm - Abstract
With the advent of the big data era, data mining technology has gradually become mature, association rules analysis is also applied in many fields. Web log mining is an important way to do some personalized services and achieve Web personalize. Apriori algorithm is a classical algorithm of association rules, but it has a lot of shortcomings. In recent years, the improvement about Apriori algorithm emerges in endlessly. In this paper, we mainly discuss the application of Matrix Apriori algorithm in Web log mining based on matrix storage. First, we analyze the improvements of Matrix Apriori and describe the process of the algorithm. We make some comparisons of several association rules algorithms. Then, Matrix Apriori algorithm is applied to Sogou search log and shoes website search log. Finally, according to the results of the Web log mining, we can make personalized recommendation and optimize site settings.
- Published
- 2017
43. Frequent Itemset Mining in Vertical Layout with E-ACO Algorithm: In Super Market
- Author
-
M. Sathya and K. Thangadurai
- Subjects
Apriori algorithm ,Association rule learning ,business.industry ,Computer science ,InformationSystems_DATABASEMANAGEMENT ,Affinity analysis ,Machine learning ,computer.software_genre ,Running time ,GSP Algorithm ,A priori and a posteriori ,Artificial intelligence ,Data mining ,business ,computer ,Algorithm ,Access time ,FSA-Red Algorithm - Abstract
Association Rule Mining is a well-liked in data mining. Mostly Apriori algorithm is used for market basket analysis but this Apriori algorithm have few limitations like scan database again and again for finding frequent itemset, more access time. etc. This limitation is reduced by using vertical datasets in Eclat algorithm with ACO Technique is called E-ACO algorithm. We propose in this paper, memory and time saving inspiration is discussed. Our detecting the method prove that techniques shows good performance of memory and running time optimization the number of rules generated.
- Published
- 2017
44. An Improved Apriori Algorithm Based on an Evolution-Communication Tissue-Like P System with Promoters and Inhibitors
- Author
-
Minghe Sun, Yuzhen Zhao, and Xiyu Liu
- Subjects
0209 industrial biotechnology ,Apriori algorithm ,Article Subject ,Computer science ,02 engineering and technology ,computer.software_genre ,Machine learning ,GSP Algorithm ,020901 industrial engineering & automation ,0202 electrical engineering, electronic engineering, information engineering ,Membrane computing ,Time complexity ,FSA-Red Algorithm ,business.industry ,lcsh:Mathematics ,InformationSystems_DATABASEMANAGEMENT ,Object (computer science) ,lcsh:QA1-939 ,ComputingMethodologies_PATTERNRECOGNITION ,Modeling and Simulation ,A priori and a posteriori ,020201 artificial intelligence & image processing ,Rewriting ,Artificial intelligence ,Data mining ,business ,computer - Abstract
Apriori algorithm, as a typical frequent itemsets mining method, can help researchers and practitioners discover implicit associations from large amounts of data. In this work, a fast Apriori algorithm, called ECTPPI-Apriori, for processing large datasets, is proposed, which is based on an evolution-communication tissue-like P system with promoters and inhibitors. The structure of the ECTPPI-Apriori algorithm is tissue-like and the evolution rules of the algorithm are object rewriting rules. The time complexity of ECTPPI-Apriori is substantially improved from that of the conventional Apriori algorithms. The results give some hints to improve conventional algorithms by using membrane computing models.
- Published
- 2017
- Full Text
- View/download PDF
45. An improved Apriori algorithm for mining association rules
- Author
-
Xiuli Yuan
- Subjects
Apriori algorithm ,Association rule learning ,Computer science ,business.industry ,InformationSystems_DATABASEMANAGEMENT ,Machine learning ,computer.software_genre ,Data mapping ,GSP Algorithm ,A priori and a posteriori ,Pruning (decision trees) ,Artificial intelligence ,Data mining ,business ,Database transaction ,computer ,FSA-Red Algorithm - Abstract
Among mining algorithms based on association rules, Apriori technique, mining frequent itermsets and interesting associations in transaction database, is not only the first used association rule mining technique but also the most popular one. After studying, it is found out that the traditional Apriori algorithms have two major bottlenecks: scanning the database frequently; generating a large number of candidate sets. Based on the inherent defects of Apriori algorithm, some related improvements are carried out: 1) using new database mapping way to avoid scanning the database repeatedly; 2) further pruning frequent itemsets and candidate itemsets in order to improve joining efficiency; 3) using overlap strategy to count support to achieve high efficiency. Under the same conditions, the results illustrate that the proposed improved Apriori algorithm improves the operating efficiency compared with other improved algorithms.
- Published
- 2017
46. Optimization of Association Rule Mining Using Hybridized Artificial Bee Colony (ABC) with BAT Algorithm
- Author
-
N. Satyanarayana, P. Krishna Murthy, and S. Neelima
- Subjects
Apriori algorithm ,Association rule learning ,Computer science ,business.industry ,InformationSystems_DATABASEMANAGEMENT ,computer.software_genre ,Machine learning ,Work performance ,Artificial bee colony algorithm ,GSP Algorithm ,A priori and a posteriori ,Artificial intelligence ,Data mining ,business ,computer ,Bat algorithm ,FSA-Red Algorithm - Abstract
One of the major tasks of data mining is association rule mining, which is used for finding the interesting relationships among the items in itemsets of huge database. Aproiri is the familiar algorithm of association rule mining for generating frequent itemsets. Apriori uses minimum support threshold to find frequent items. In this paper, an algorithm called hybridization of ABC with BAT algorithm is proposed which is used for optimization of association rules. Instead of onlooker bee phase of ABC, random walk of BAT is used in order to increase the exploration. Hybridized ABC with BAT algorithm is applied on the rules generated from apriori algorithm, for optimizing association rules. The experiments are performed on datasets taken from UCI repository which show the proposed work performance and proposed methodology can effectively optimize association rules when compared to the existing algorithms. In the paper, we also proved that the rules generated using proposed work are simple and comprehensible.
- Published
- 2017
47. Survey on various improved Apriori Algorithms
- Author
-
Shilpi Singla and Arun Malik
- Subjects
Apriori algorithm ,Association rule learning ,Computer science ,InformationSystems_DATABASEMANAGEMENT ,computer.software_genre ,GSP Algorithm ,Set (abstract data type) ,ComputingMethodologies_PATTERNRECOGNITION ,A priori and a posteriori ,Data mining ,K-optimal pattern discovery ,computer ,Algorithm ,FSA-Red Algorithm - Abstract
Data Mining is a way of obtaining undetected patterns or facts from massive amount of data in a database. Association rule mining is a major technique in the area of data mining. Association rule mining finds frequent itemsets from a set of transactional databases. Apriori algorithm is one of the earliest algorithm of association rule mining. Apriori employs an iterative approach known as levelwise search. In this paper, we have presented a survey of most recent work that has been done by researchers in Association rule based mining using Apriori algorithm.
- Published
- 2014
48. The Design and Analysis of the Information Management System Based on Data Mining
- Author
-
Ya Ni Zhang
- Subjects
GSP Algorithm ,Management information systems ,Apriori algorithm ,Identification (information) ,Association rule learning ,Computer science ,General Medicine ,Pruning (decision trees) ,Data mining ,computer.software_genre ,computer ,Database transaction ,FSA-Red Algorithm - Abstract
This paper studies on the data mining technology based on association rules, and analyzes on important algorithm in association rules - the advantages and disadvantages of Apriori algorithm and puts forward an improved Apriori-mapping algorithm based on address mapping. This algorithm adopts the way of horizontal deposit transaction, establishes candidate item identification list of corresponding candidate project transaction and length value of transaction list. And shorten the pruning operation time by address mapping, and compress the frequent item sets number of operation connected operation with large amplitude.The system efficiency is improved, and the performance of the algorithm has been improved by experiment.
- Published
- 2014
49. Data Series Mining Technology Analysis Based on Web Database
- Author
-
Yang Hu
- Subjects
Scheme (programming language) ,Sequence ,Database ,Computer science ,Data stream mining ,General Medicine ,Data series ,computer.software_genre ,Data warehouse ,GSP Algorithm ,Web mining ,Data mining ,computer ,computer.programming_language ,FSA-Red Algorithm - Abstract
Based on Web data, the paper regarded data series mining technology as research content, researched and realized sequence GSP algorithm with higher mining efficiency on sequence pattern, tested the algorithm and analyzed mining pattern. Moreover, the paper realized a scheme that used data warehouse (DW) to excavate sequence of Web visit to use entirely, tested the scheme by adopting logs of real business sites, analyzed mining results and drew a conclusion that this algorithm had an excellent application effect.
- Published
- 2014
50. Efficient updating of sequential patterns with transaction insertion
- Author
-
Chun-Wei Lin, Wen-Yang Lin, Tzung-Pei Hong, and Guo-Cheng Lan
- Subjects
GSP Algorithm ,Structure (mathematical logic) ,Important research ,Artificial Intelligence ,Computer science ,Process (computing) ,Computer Vision and Pattern Recognition ,Data mining ,computer.software_genre ,Database transaction ,computer ,Theoretical Computer Science - Abstract
Mining useful information from large databases has become an important research issue in recent years. Among the classes of derived knowledge, sequential patterns can be used to discover customer behaviors to facilitate efficient decision-making. The fast updated frequent-pattern (FUFP)-tree algorithm has been proposed to efficiently mine frequent itemsets through transaction insertion in incremental mining. The present study extends this algorithm to develop an incremental fast updated sequential-pattern (FUSP)-tree algorithm for efficiently handling sequential patterns whether the newly inserted transactions are from customers already in the database or from new customers. An FUSP-tree structure is designed to make the update process easier. The FUSP-growth algorithm is proposed for mining frequent sequences (sequential patterns) from the FUSP-tree. Experimental results show that the proposed FUSP-tree algorithm has good performance for incrementally handling newly inserted transactions.
- Published
- 2014
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.