25 results on '"High utility patterns"'
Search Results
2. Interpretable Classifier Models for Decision Support Using High Utility Gain Patterns
- Author
-
Srikumar Krishnamoorthy
- Subjects
Analytics ,interpretable machine learning ,explainable artificial intelligence ,classification ,high utility patterns ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Ensemble models such as gradient boosting and random forests are proven to offer the best predictive performance on a wide variety of supervised learning problems. The high performance of these black box models, however, comes at a cost of model interpretability. They are also inadequate to meet regulatory demands and explainability needs of organizations. The model interpretability in high performance black-box models is achieved with the help of post-hoc explainable models such as Local Interpretable Model-agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP). This paper presents an alternate intrinsic classifier model that extracts a class of higher order patterns and embeds them into an interpretable learning model. More specifically, the proposed model extracts novel High Utility Gain (HUG) patterns that capture higher order interactions, transforms the model input data into a new space, and applies interpretable classifier methods on the transformed space. We conduct rigorous experiments on forty benchmark binary and multi-class classification datasets to evaluate the proposed model against the state-of-the-art ensemble and interpretable classifier models. The proposed model was comprehensively assessed on three key dimensions: 1) quality of predictions using classifier measures such as accuracy, $F_{1}$ , AUC, H-measure, and logistic loss, 2) computational performance on large and high-dimensional data, and 3) interpretability aspects. The HUG-based learning model was found to deliver performance comparable to that of the state-of-the-art ensemble models. Our model was also found to achieve 2-40% (45%) prediction quality (interpretability) improvements with significantly lower computational requirements over other interpretable classifier models. Furthermore, we present case studies in finance and healthcare domains and generate one- and two-dimensional HUG profiles to illustrate the interpretability aspects of our HUG models. The proposed solution offers an alternate approach to build high performance and transparent machine learning classifier models. We hope that our ML solution help organizations meet their growing regulatory and explainability needs.
- Published
- 2024
- Full Text
- View/download PDF
3. A Robust Privacy Preserving Approach for Sanitizing Transaction Databases from Sensitive High Utility Patterns
- Author
-
Ashraf, Mohamed, Rady, Sherine, Abdelkader, Tamer, Gharib, Tarek F., Xhafa, Fatos, Series Editor, Hassanien, Aboul Ella, editor, Snášel, Václav, editor, Tang, Mincong, editor, Sung, Tien-Wen, editor, and Chang, Kuo-Chi, editor
- Published
- 2023
- Full Text
- View/download PDF
4. Efficient Approach for Damped Window-Based High Utility Pattern Mining With List Structure
- Author
-
Hyoju Nam, Unil Yun, Bay Vo, Tin Truong, Zhi-Hong Deng, and Eunchul Yoon
- Subjects
Data mining ,damped window model ,pattern pruning ,high utility patterns ,stream data mining ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Traditional pattern mining is designed to handle binary database that assume all items in the database have same importance, there is a limitation to recognize accurate information from real-world databases using traditional method. To solve this problem, the high utility pattern mining approaches from non-binary database have been proposed and actively studied by many researchers. Lately, new data is progressively created with the passage of time in diverse area such as biometric data of a patient diagnosed in a medical device and log data of an internet user, and the volume of a database is gradually increasing. A database with these characteristics is called a dynamic database. Under these circumstances, high utility mining techniques suitable for analyzing dynamic databases have recently been extensively studied. In this paper, we propose a new list-based algorithm that mines high utility patterns considering the arrival time of each transaction in an incremental database environment. That is, our algorithm efficiently performs pattern pruning by using a damped window model that considers the importance of the previously inputted data lower than that of recently inserted data and identifies high utility patterns. Experimental results indicate that our proposed method has better performance than the state-of-the-art techniques in terms of runtime, memory, and scalability.
- Published
- 2020
- Full Text
- View/download PDF
5. Incremental Mining of High Utility Patterns in One Phase by Absence and Legacy-Based Pruning
- Author
-
Junqiang Liu, Xinyi Ju, Xingxing Zhang, Benjamin C. M. Fung, Xiangcai Yang, and Changhong Yu
- Subjects
Data mining ,utility mining ,high utility patterns ,pattern mining ,dynamic databases ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Mining high utility patterns in dynamic databases is an important data mining task. While a naive approach is to mine a newly updated database in its entirety, the state-of-the-art mining algorithms all take an incremental approach. However, the existing incremental algorithms either take a two-phase paradigm that generates a large number of candidates that causes scalability issues or employ a vertical data structure that incurs a large number of join operations that leads to efficiency issues. To address the challenges with the existing incremental algorithms, this paper proposes a new algorithm incremental direct discovery of high utility patterns (Id2HUP+). Id2HUP+ adapts a one-phase paradigm by improving the relevance-based pruning and upper-bound-based pruning proposes a novel data structure for a quick update of dynamic databases and proposes the absence-based pruning and legacy-based pruning dedicated to incremental mining. The extensive experiments show that our algorithm is up to 1-3 orders of magnitude more efficient than the state-of-the-art algorithms, and is the most scalable algorithm.
- Published
- 2019
- Full Text
- View/download PDF
6. Performance Analysis of Tree-Based Algorithms for Incremental High Utility Pattern Mining
- Author
-
Ryang, Heungmo, Yun, Unil, Park, James J. (Jong Hyuk), editor, Pan, Yi, editor, Yi, Gangman, editor, and Loia, Vincenzo, editor
- Published
- 2017
- Full Text
- View/download PDF
7. Efficient approach of recent high utility stream pattern mining with indexed list structure and pruning strategy considering arrival times of transactions.
- Author
-
Nam, Hyoju, Yun, Unil, Yoon, Eunchul, and Chun- Wei Lin, Jerry
- Subjects
- *
SEQUENTIAL pattern mining , *SOCIAL media , *RIVERS , *SOCIAL networks - Abstract
One of various pattern mining techniques, the High Utility Pattern Mining (HUPM) is a method for finding meaningful patterns from non-binary databases by considering the characteristics of the items. Recently, new data continues to flow over time in diverse fields such as sales data of market, heartbeat sensor data, and social network service. Since these data have a feature that recently generated data have higher influence than the old data, research has been focused on how to efficiently extract hidden knowledge from time-sensitive databases. In this paper, we propose indexed list-based algorithm that mines recent high utility pattern considering the arrival time of inserted data in an environment where new data is continuously accumulated. In other words, to treat the importance of recent data higher than the that of old data, our algorithms reduces the utility values of old transactions according to the time the data is inserted by applying damped window model concept. Moreover, we carry out various experiments to compare our method with state-of-the-art algorithms using real and synthetic datasets in diverse circumstances. Experimental results show that our algorithm outperforms competitors in terms of execution time, memory usage, and scalability test. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
8. Time-Fading Based High Utility Pattern Mining from Uncertain Data Streams
- Author
-
Manike, Chiranjeevi, Om, Hari, Howlett, Robert J., Series editor, Jain, Lakhmi C., Series editor, Kumar Kundu, Malay, editor, Mohapatra, Durga Prasad, editor, Konar, Amit, editor, and Chakraborty, Aruna, editor
- Published
- 2014
- Full Text
- View/download PDF
9. Efficient mining of extraordinary patterns by pruning and predicting.
- Author
-
Liu, Junqiang, Chang, Zhongmin, Leung, Carson K.S., Wong, Raymond C.W., Xu, Yabo, and Zhao, Rong
- Subjects
- *
PRUNING , *EXAMPLE - Abstract
Highlights • Extraordinary patterns are those with supports and utilities in opposite extremes. • Estimating tight lower bounds both on supports and utilities are possible. • Both upper bounds and lower bounds based pruning are effective. • Pattern growth with pruning and predicting improve efficiency 2 orders of magnitude. Abstract Pattern mining is an important data mining technology. The existing pattern mining algorithms mainly focus on discovery of ordinary patterns in databases, for example, frequent pattern mining finds patterns with high frequencies and utility pattern mining discovers patterns with high utilities. However, in many real-world applications, people are more interested in finding extraordinary patterns with low frequencies and high utilities or with high frequencies and low utilities. While mining ordinary patterns is computationally hard, it is even harder to mine extraordinary patterns. In particular, a two-phase approach that first generates and materializes candidates (high-frequency patterns or high-utility patterns) and then finds extraordinary patterns from the candidates, suffers from the scalability and efficiency bottlenecks. This paper proposes an efficient algorithm for mining extraordinary patterns. The novelty of our algorithm lies in newly proposed lower bounds both on frequencies and on utilities of patterns, new pruning strategies and new predicting strategies for dramatically reducing the search space, and a novel data structure for efficient computation. The proposed algorithm employs a single-phase approach without materializing candidates, and also adapts the upper bounds on supports and on utilities for pruning. Extensive experiments show that the new pruning and predicting strategies are effective, and the proposed algorithm is efficient and scalable. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
10. Efficient incremental high utility pattern mining based on pre-large concept.
- Author
-
Lee, Judae, Yun, Unil, Lee, Gangin, and Yoon, Eunchul
- Subjects
- *
DATA mining , *PATTERN recognition systems , *PROBLEM solving , *DATA structures , *DYNAMICAL systems - Abstract
High utility pattern mining has been actively researched in recent years, because it treats real world databases better than traditional pattern mining approaches. Retail data of markets and web access information data are representative examples of the real world data. However, fundamental high utility pattern mining methods aiming static data are not proper for dynamic data environments. The pre-large concept based methods have efficiency compared to static approaches when dealing with dynamic data. There are several methods dealing with dynamic data based on the pre-large concept, but they have drawbacks that they have to scan original data again and generate many candidate patterns. These two drawbacks are the main issues of performance degradation. To handle these problems, in this paper, we suggest an efficient approach of pre-large concept based incremental utility pattern mining. The proposed method adopts a more proper data structure to mine high utility patterns in incremental environments. The state-of-the-art method performs a database scan operation many times, which is not suitable for incremental environments. However, our method needs only one scan, which is more suitable to process dynamic data compared to the state-of-the-art method. In addition, with the proposed data structure, high utility patterns can be mined in dynamic environments more efficiently than the former method. Experimental results on real datasets and synthetic datasets show that the proposed method has better performance than the former method. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
11. An efficient algorithm for mining high utility patterns from incremental databases with one database scan.
- Author
-
Yun, Unil, Ryang, Heungmo, Lee, Gangin, and Fujita, Hamido
- Subjects
- *
ALGORITHMS , *PATTERN recognition systems , *DATA mining , *UTILITY functions , *PROBLEM solving - Abstract
High utility pattern mining has been actively researched as one of the significant topics in the data mining field since this approach can solve the limitation of traditional pattern mining that cannot fully consider characteristics of real world databases. Moreover, database volumes have been bigger gradually in various applications such as sales data of retail markets and connection information of web services, and general methods for static databases are not suitable for processing dynamic databases and extracting useful information from them. Although incremental utility pattern mining approaches have been suggested, previous approaches need at least two scans for incremental utility pattern mining irrespective of using any structure. However, the approaches with multiple scans are actually not adequate for stream environments. In this paper, we propose an efficient algorithm for mining high utility patterns from incremental databases with one database scan based on a list-based data structure without candidate generation. Experimental results with real and synthetic datasets show that the proposed algorithm outperforms previous one phase construction methods with candidate generation. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
12. Indexed list-based high utility pattern mining with utility upper-bound reduction and pattern combination techniques.
- Author
-
Ryang, Heungmo and Yun, Unil
- Subjects
COMPUTATIONAL acoustics ,COMPUTATIONAL physics ,MARKETING research ,MINING methodology ,SCIENTIFIC method - Abstract
High utility pattern mining has been studied as an essential topic in the field of pattern mining in order to satisfy requirements of many real-world applications that need to process non-binary databases including item importance such as market analysis. In this paper, we propose an efficient algorithm with a novel indexed list-based data structure for mining high utility patterns. Previous approaches first generate an enormous number of candidate patterns on the basis of overestimation methods in their mining processes and then identify actual high utility patterns from the candidates through an additional database scan, which leads to high computational overheads. Although several list-based algorithms to discover high utility patterns without candidate generation have been suggested in recent years, they require a large number of comparison operations. Our method facilitates efficient mining of high utility patterns with the proposed indexed list by effectively reducing the total number of such operations. Moreover, we develop two techniques based on this novel data structure to more enhance mining performance of the proposed method. Experimental results on real and synthetic datasets show that the proposed algorithm mines high utility patterns more efficiently than the state-of-the-art algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
13. Opportunistic mining of top-n high utility patterns.
- Author
-
Liu, Junqiang, Zhang, Xingxing, Fung, Benjamin C.M., Li, Jiuyong, and Iqbal, Farkhund
- Subjects
- *
DATA mining , *ALGORITHMS , *PATTERN recognition systems , *UTILITY functions , *PROBLEM solving - Abstract
Mining high utility patterns is an important data mining problem that is formulated as finding patterns whose utilities are no less than a threshold. As the mining results are very sensitive to such a threshold, it is difficult for users to specify an appropriate one. An alternative formulation of the problem is to find the top- n high utility patterns. However, the second formulation is more challenging because the corresponding threshold is unknown in advance and the solution search space becomes even larger. When there are very long patterns prior algorithms simply cannot work to mine top- n high utility patterns even for very small n . This paper proposes a novel algorithm for mining top- n high utility patterns that are long. The proposed algorithm adopts an opportunistic pattern growth approach and proposes five opportunistic strategies for scalably maintaining shortlisted patterns, for efficiently computing utilities, and for estimating tight upper bounds to prune search space. Extensive experiments show that the proposed algorithm is 1 to 3 orders of magnitude more efficient than the state-of-the-art top- n high utility pattern mining algorithms, and it is even up to 2 orders of magnitude faster than high utility pattern mining algorithms that are tuned with an optimal threshold. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
14. Modified GUIDE (LM) algorithm for mining maximal high utility patterns from data streams
- Author
-
Chiranjeevi Manike and Hari Om
- Subjects
High utility patterns ,Data mining ,Maximal Patterns ,Anti-monotone property ,Transaction projection ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
High utility pattern mining is an emerging research topic in the data mining field. Unlike frequent pattern mining, high utility pattern mining deals with non-binary databases, in which the information about purchased quantities of items is maintained. Due to the non-existence of anti-monotone property among the utilities of itemsets, utility mining becomes a big challenge. Moreover, discovering useful patterns from the huge number of potential patterns is a mining bottleneck. However, the compact (Closed and Maximal) high utility pattern mining moderately lessens the number of patterns, but it does not solve it. Recently, an efficient framework called GUIDE, was proposed in the literature to address this issue. Though, GUIDE effectively reduced the number of high utility patterns, yet the quality of few mined patterns and their utilities are not exact. In view of this, we propose a modified MGUIDE algorithm to improve the quality and determine exact utilities of maximal patterns.
- Published
- 2015
- Full Text
- View/download PDF
15. Mining High Utility Patterns in One Phase without Generating Candidates.
- Author
-
Liu, Junqiang, Wang, Ke, and Fung, Benjamin C.M.
- Subjects
- *
DATA mining , *DATA structures , *DATABASES , *ALGORITHMS , *SCALABILITY - Abstract
Utility mining is a new development of data mining technology. Among utility mining problems, utility mining with the itemset share framework is a hard one as no anti-monotonicity property holds with the interestingness measure. Prior works on this problem all employ a two-phase, candidate generation approach with one exception that is however inefficient and not scalable with large databases. The two-phase approach suffers from scalability issue due to the huge number of candidates. This paper proposes a novel algorithm that finds high utility patterns in a single phase without generating candidates. The novelties lie in a high utility pattern growth approach, a lookahead strategy, and a linear data structure. Concretely, our pattern growth approach is to search a reverse set enumeration tree and to prune search space by utility upper bounding. We also look ahead to identify high utility patterns without enumeration by a closure property and a singleton property. Our linear data structure enables us to compute a tight bound for powerful pruning and to directly identify high utility patterns in an efficient and scalable way, which targets the root cause with prior algorithms. Extensive experiments on sparse and dense, synthetic and real world data suggest that our algorithm is up to 1 to 3 orders of magnitude more efficient and is more scalable than the state-of-the-art algorithms. [ABSTRACT FROM PUBLISHER]
- Published
- 2016
- Full Text
- View/download PDF
16. Fast algorithm for high utility pattern mining with the sum of item quantities.
- Author
-
Heungmo Ryang, Unil Yun, and Keun Ho Ryu
- Subjects
- *
DATA mining , *ALGORITHM research , *DATABASE searching , *COMPUTER science research , *DATA analysis - Abstract
In frequent pattern mining, items are considered as having the same importance in a database and their occurrence are represented as binary values in transactions. In real-world databases, however, items not only have relative importance but also are represented as non-binary values in transactions. High utility pattern mining is one of the most essential issues in the pattern mining field, which recently emerged to address the limitation of frequent pattern mining. Meanwhile, tree construction with a single database scan is significant since a database scan is a time-consuming task. In utility mining, an additional database scan is necessary to identify actual high utility patterns from candidates. In this paper, we propose a novel tree structure, namely SIQTree (Sum of Item Quantities), which captures database information through a single-pass. Moreover, a restructuring method is suggested with strategies for reducing overestimated utilities. The proposed algorithm can construct the SIQ-Tree with only a single scan and decrease the number of candidate patterns effectively with the reduced overestimation utilities, through which mining performance is improved. Experimental results show that our algorithm outperforms a state-of-the-art one in terms of runtime and the number of generated candidates with a similar memory usage. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
17. Incremental Mining of High Utility Patterns in One Phase by Absence and Legacy-Based Pruning
- Author
-
Xiangcai Yang, Xingxing Zhang, Junqiang Liu, Benjamin C. M. Fung, Changhong Yu, and Xinyi Ju
- Subjects
General Computer Science ,Computer science ,dynamic databases ,General Engineering ,Data structure ,computer.software_genre ,Phase (combat) ,Task (project management) ,utility mining ,Scalability ,Task analysis ,General Materials Science ,Relevance (information retrieval) ,Pruning (decision trees) ,Data mining ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,computer ,lcsh:TK1-9971 ,high utility patterns ,pattern mining - Abstract
Mining high utility patterns in dynamic databases is an important data mining task. While a naive approach is to mine a newly updated database in its entirety, the state-of-the-art mining algorithms all take an incremental approach. However, the existing incremental algorithms either take a two-phase paradigm that generates a large number of candidates that causes scalability issues or employ a vertical data structure that incurs a large number of join operations that leads to efficiency issues. To address the challenges with the existing incremental algorithms, this paper proposes a new algorithm incremental direct discovery of high utility patterns (Id2HUP+). Id2HUP+ adapts a one-phase paradigm by improving the relevance-based pruning and upper-bound-based pruning proposes a novel data structure for a quick update of dynamic databases and proposes the absence-based pruning and legacy-based pruning dedicated to incremental mining. The extensive experiments show that our algorithm is up to 1-3 orders of magnitude more efficient than the state-of-the-art algorithms, and is the most scalable algorithm.
- Published
- 2019
18. Top-k high utility pattern mining with effective threshold raising strategies.
- Author
-
Ryang, Heungmo and Yun, Unil
- Subjects
- *
DATA mining , *COMPUTER users , *DATABASES , *ALGORITHMS , *UTILITY theory - Abstract
In pattern mining, users generally set a minimum threshold to find useful patterns from databases. As a result, patterns with higher values than the user-given threshold are discovered. However, it is hard for the users to determine an appropriate minimum threshold. The reason for this is that they cannot predict the exact number of patterns mined by the threshold and control the mining result precisely, which can lead to performance degradation. To address this issue, top- k mining has been proposed for discovering patterns from ones with the highest value to ones with the k th highest value with setting the desired number of patterns, k . Top- k utility mining has emerged to consider characteristics of real-world databases such as relative importance of items and item quantities with the advantages of top- k mining. Although a relevant algorithm has been suggested in recent years, it generates a huge number of candidate patterns, which results in an enormous amount of execution time. In this paper, we propose an efficient algorithm for mining top- k high utility patterns with highly decreased candidates. For this purpose, we develop three strategies that can reduce the search space by raising a minimum threshold effectively in the construction of a global tree, where they utilize exact and pre-evaluated utilities of itemsets. Moreover, we suggest a strategy to identify actual top- k high utility patterns from candidates with the exact and pre-calculated utilities. Comprehensive experimental results on both real and synthetic datasets show that our algorithm with the strategies outperforms state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
19. Incremental high utility pattern mining with static and dynamic databases.
- Author
-
Yun, Unil and Ryang, Heungmo
- Subjects
DATA mining ,DECISION making ,MEDICAL databases ,DATA integration ,ALGORITHMS - Abstract
Pattern mining is a data mining technique used for discovering significant patterns and has been applied to various applications such as disease analysis in medical databases and decision making in business. Frequent pattern mining based on item frequencies is the most fundamental topic in the pattern mining field. However, it is difficult to discover the important patterns on the basis of only frequencies since characteristics of real-world databases such as relative importance of items and non-binary transactions are not reflected. In this regard, utility pattern mining has been considered as an emergent research topic that deals with the characteristics. In real-world applications, meanwhile newly generated data by continuous operation or data in other databases for integration analysis can be gradually added to the current database. To efficiently deal with both existing and new data as a database, it is necessary to reflect increased data to previous analysis results without analyzing the whole database again. In this paper, we propose an algorithm called HUPID-Growth (High Utility Patterns in Incremental Databases Growth) for mining high utility patterns in incremental databases. Moreover, we suggest a tree structure constructed with a single database scan named HUPID-Tree (High Utility Patterns in Incremental Databases Tree), and a restructuring method with a novel data structure called TIList (Tail-node Information List) in order to process incremental databases more efficiently. We conduct various experiments for performance evaluation with state-of-the-art algorithms. The experimental results show that the proposed algorithm more efficiently processes real datasets compared to previous ones. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
20. A Framework for Mining High Utility Web Access Sequences.
- Author
-
Ahmed, Chowdhury Farhan, Tanbeer, Syed Khairuzzaman, and Byeong-Soo Jeong
- Subjects
- *
DATA mining , *WEBSITES , *BLOGS , *DATABASE management , *INFORMATION storage & retrieval systems , *ISPL (Electronic computer system) - Abstract
Mining web access sequences (WASs) can discover very useful knowledge from web logs with broad applications. By considering non-binary occurrences of web pages as internal utilities in WASs, e.g., time spent by each user in a web page, more realistic information can be extracted. However, the existing utility-based approach has many limitations such as considering only forward references of web access sequences, not applicable for incremental mining, suffers in the level-wise candidate generation-and-test methodology, needs several database scans and does not show how to mine web access sequences with different impacts/significances for different web pages. In this paper, we propose a novel framework to solve these problems. Moreover, we propose two new tree structures, called utility-based WAS tree (UWAStree) and incremental UWAS-tree (IUWAS-tree) for mining WASs in static and incremental databases,respectively. Our approach can handle both forward and backward references, static and incremental data, avoids the level-wise candidate generation-and-test methodology, does not scan databases several times, and considers both internal and external utilities of a web page. The IUWAS-tree is also applicable for interactive mining. Extensive performance analyses show that our approach is very efficient for both static and incremental mining of high utility WASs. [ABSTRACT FROM AUTHOR]
- Published
- 2011
- Full Text
- View/download PDF
21. Analyzing of incremental high utility pattern mining based on tree structures
- Author
-
Gangin Lee, Unil Yun, and Judae Lee
- Subjects
General Computer Science ,Computer science ,02 engineering and technology ,Machine learning ,computer.software_genre ,lcsh:QA75.5-76.95 ,Tree-based algorithms ,Utility mining ,Pattern mining ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,lcsh:Information theory ,Data mining ,Data processing ,business.industry ,lcsh:Q350-390 ,Tree structure ,Incremental mining ,Benchmark (computing) ,High utility patterns ,020201 artificial intelligence & image processing ,Artificial intelligence ,lcsh:Electronic computers. Computer science ,business ,computer - Abstract
Since the concept of high utility pattern mining was proposed to solve the drawbacks of traditional frequent pattern mining approach that cannot handle various features of real-world applications, many different techniques and algorithms for high utility pattern mining have been developed. Moreover, several advanced methods for incremental data processing have been proposed in recent years as the sizes of recent databases obtained in the real world become larger. In this paper, we introduce the basic concept of incremental high utility pattern mining and analyze various relevant methods. In addition, we also conduct performance evaluation for the methods with famous benchmark datasets in order to determine their detailed characteristics. The evaluation shows that the less candidate patterns make algorithms faster.
- Published
- 2017
- Full Text
- View/download PDF
22. Opportunistic mining of top-n high utility patterns
- Author
-
Jiuyong Li, Farkhund Iqbal, Junqiang Liu, Benjamin C. M. Fung, Xingxing Zhang, Liu, Junqiang, Zhang, Xingxing, Fung, Benjamin C.M., Li, Jiuyong, and Iqbal, Farkhund
- Subjects
Mathematical optimization ,Information Systems and Management ,Computer science ,02 engineering and technology ,Space (commercial competition) ,frequent patterns ,Computer Science Applications ,Theoretical Computer Science ,Orders of magnitude (bit rate) ,Artificial Intelligence ,Control and Systems Engineering ,utility mining ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,top-n interesting patterns ,020201 artificial intelligence & image processing ,Software ,pattern mining ,high utility patterns - Abstract
Mining high utility patterns is an important data mining problem that is formulated as f inding patterns whose utilities are no less than a threshold. As the mining results are very sensitive to such a threshold, it is difficult for users to specify an appropriate one. An alternative formulation of the problem is to find the top-n high utility patterns. However, the second formulation is more challenging because the corresponding threshold is unknown in advance and the solution search space becomes even larger. When there are very long patterns prior algorithms simply cannot work to mine top-n high utility patterns even for very small n. This paper proposes a novel algorithm for mining top-n high utility patterns that are long. The proposed algorithm adopts an opportunistic pattern growth approach and proposes five opportunistic strategies for scalably maintaining shortlisted patterns, for efficiently computing utilities, and for estimating tight upper bounds to prune search space. Extensive experiments show that the proposed algorithm is 1 to 3 orders of magnitude more efficient than the state-of-the-art top-n high utility pattern mining algorithms, and it is even up to 2 orders of magnitude faster than high utility pattern mining algorithms that are tuned with an optimal threshold. Refereed/Peer-reviewed
- Published
- 2018
23. Damped sliding based utility oriented pattern mining over stream data.
- Author
-
Kim, Heonho, Yun, Unil, Baek, Yoonji, Kim, Hyunsoo, Nam, Hyoju, Lin, Jerry Chun-Wei, and Fournier-Viger, Philippe
- Subjects
- *
SEQUENTIAL pattern mining , *RIVERS , *BATCH processing , *ELECTRONIC data processing , *SYNCHRONOUS generators - Abstract
High utility pattern mining (HUPM) discovers meaningful patterns by considering features of items and utility from non-binary data. Data called stream data is continually generated over time. Various techniques based on high utility pattern mining have been suggested for processing stream data. High utility pattern mining based on a sliding window performs pattern mining using a window. Since it uses only the data stored in a window, only the latest data can be managed. However, Stream data has the property that newly created data has a higher influence than relatively old data. It is necessary to consider the importance of the data stored in a window differently. In this paper, we propose an efficient algorithm based on a sliding window approach that mines high utility patterns considering the latest data more significantly from damped stream data where new data is constantly being inserted. In other words, our technique divides the stream data into fixed-sized multiple batch data and processes differently the importance of each batch data in a window according to the added time using the decaying factor. Moreover, we conduct experiments to compare and analyze our approach with the state-of-the-art algorithms using real and synthetic datasets. The experimental results show that our proposed method outperforms the competitors in terms of run time, memory usage, and scalability test. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
24. Modified GUIDE (LM) algorithm for mining maximal high utility patterns from data streams
- Author
-
Manike, Chiranjeevi and Om, Hari
- Published
- 2015
- Full Text
- View/download PDF
25. Modified GUIDE (LM) algorithm for mining maximal high utility patterns from data streams
- Author
-
Hari Om and Chiranjeevi Manike
- Subjects
General Computer Science ,Transaction projection ,Property (programming) ,Data stream mining ,Computer science ,media_common.quotation_subject ,computer.software_genre ,lcsh:QA75.5-76.95 ,Bottleneck ,Field (computer science) ,Computational Mathematics ,High utility patterns ,Maximal Patterns ,Quality (business) ,Anti-monotone property ,lcsh:Electronic computers. Computer science ,Data mining ,Algorithm ,computer ,Utility mining ,media_common - Abstract
High utility pattern mining is an emerging research topic in the data mining field. Unlike frequent pattern mining, high utility pattern mining deals with non-binary databases, in which the information about purchased quantities of items is maintained. Due to the non-existence of anti-monotone property among the utilities of itemsets, utility mining becomes a big challenge. Moreover, discovering useful patterns from the huge number of potential patterns is a mining bottleneck. However, the compact (Closed and Maximal) high utility pattern mining moderately lessens the number of patterns, but it does not solve it. Recently, an efficient framework called GUIDE, was proposed in the literature to address this issue. Though, GUIDE effectively reduced the number of high utility patterns, yet the quality of few mined patterns and their utilities are not exact. In view of this, we propose a modified MGUIDELM algorithm to improve the quality and determine exact utilities of maximal patterns.
- Published
- 2015
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.