423 results
Search Results
2. Data science fundamentals for Python and MongoDB.
- Author
-
Paper, David
- Subjects
MongoDB ,Data mining ,Python (Computer program language) ,COMPUTERS -- General ,Programming & scripting languages - Abstract
Summary: Build the foundational data science skills necessary to work with and better understand complex data science algorithms. This example-driven book provides complete Python coding examples to complement and clarify data science concepts, and enrich the learning experience. Coding examples include visualizations whenever appropriate. The book is a necessary precursor to applying and implementing machine learning algorithms. The book is self-contained. All of the math, statistics, stochastic, and programming skills required to master the content are covered. In-depth knowledge of object-oriented programming isn't required because complete examples are provided and explained. Data Science Fundamentals with Python and MongoDB is an excellent starting point for those interested in pursuing a career in data science. Like any science, the fundamentals of data science are a prerequisite to competency. Without proficiency in mathematics, statistics, data manipulation, and coding, the path to success is "rocky" at best. The coding examples in this book are concise, accurate, and complete, and perfectly complement the data science concepts introduced. What You'll Learn: Prepare for a career in data science Work with complex data structures in Python Simulate with Monte Carlo and Stochastic algorithms Apply linear algebra using vectors and matrices Utilize complex algorithms such as gradient descent and principal component analysis Wrangle, cleanse, visualize, and problem solve with data Use MongoDB and JSON to work with data.
- Published
- 2018
3. Process Mining Workshops. ICPM 2022 International Workshops, Bozen-Bolzano, Italy, October 23-28, 2022, Revised Selected Papers.
- Author
-
Montali, Marco, Montali, Marco, Senderovich, Arik, and Weidlich, Matthias
- Subjects
Business mathematics & systems ,Data mining ,Health & safety aspects of IT ,Information technology: general issues ,Machine learning ,business process management ,conformance checking ,data science ,deep learning ,event data ,health informatics ,knowledge graphs ,machine learning ,predictive process monitoring ,process analytics ,process discovery ,process mining ,process querying ,streaming analytics - Abstract
Summary: This open access book constitutes revised selected papers from the International Workshops held at the 4th International Conference on Process Mining, ICPM 2022, which took place in Bozen-Bolzano, Italy, during October 23-28, 2022. The conference focuses on the area of process mining research and practice, including theory, algorithmic challenges, and applications. The co-located workshops provided a forum for novel research ideas. The 42 papers included in this volume were carefully reviewed and selected from 89 submissions. They stem from the following workshops: - 3rd International Workshop on Event Data and Behavioral Analytics (EDBA) - 3rd International Workshop on Leveraging Machine Learning in Process Mining (ML4PM) - 3rd International Workshop on Responsible Process Mining (RPM) (previously known as Trust, Privacy and Security Aspects in Process Analytics) - 5th International Workshop on Process-Oriented Data Science for Healthcare (PODS4H) - 3rd International Workshop on Streaming Analytics for Process Mining (SA4PM) - 7th International Workshop on Process Querying, Manipulation, and Intelligence (PQMI) - 1st International Workshop on Education meets Process Mining (EduPM) - 1st International Workshop on Data Quality and Transformation in Process Mining (DQT-PM)
4. Computational intelligence and intelligent systems.
- Author
-
Castiglione, Aniello, Li, Jin, Li, Kangshun, and Liu, Yong
- Subjects
Artificial intelligence ,Computer simulation ,Data mining - Abstract
Summary: This book constitutes the refereed proceedings of the 7th International Symposium on Intelligence Computation and Applications, ISICA 2015, held in Guangzhou, China, in November 2015. The 77 revised full papers presented were carefully reviewed and selected from 189 submissions. The papers feature the most up-to-date research in analysis and theory of evolutionary computation, neural network architectures and learning; neuro-dynamics and neuro-engineering; fuzzy logic and control; collective intelligence and hybrid systems; deep learning; knowledge discovery; learning and reasoning.
- Published
- 2016
5. Mining Intelligence and Knowledge Exploration : Third International Conference, MIKE 2015, Hyderabad, India, December 9-11, 2015, Proceedings.
- Author
-
Kathirvalavakumar, T., Prasath, Rajendra, and Vuppala, Anil Kumar
- Subjects
Algorithms ,Application software ,Artificial intelligence ,Data mining ,Information storage and retrieval ,Optical data processing ,Artificial Intelligence ,Algorithm Analysis and Problem Complexity ,Computer Imaging, Vision, Pattern Recognition and Graphics ,Data Mining and Knowledge Discovery ,Information Storage and Retrieval ,Information Systems Applications (incl. Internet) - Abstract
Summary: This book constitutes the refereed proceedings of the Third International Conference on Mining Intelligence and Knowledge Exploration, MIKE 2015, held in Hyderabad, India, in December 2015. The 48 full papers and 8 short papers presented together with 4 doctoral consortium papers were carefully reviewed and selected from 185 submissions. The papers cover a wide range of topics including information retrieval, machine learning, pattern recognition, knowledge discovery, classification, clustering, image processing, network security, speech processing, natural language processing, language, cognition and computation, fuzzy sets, and business intelligence.
- Published
- 2015
6. Machine learning and knowledge discovery in databases : European conference, ECML PKDD 2008, Antwerp, Belgium, September 15-19, 2008 : proceedings.
- Author
-
Daelemans, Walter, Goethals, Bart, and Morik, Katharina
- Subjects
Artificial intelligence ,Data mining ,Machine learning - Abstract
Summary: This book constitutes the refereed proceedings of the joint conference on Machine Learning and Knowledge Discovery in Databases: ECML PKDD 2008, held in Antwerp, Belgium, in September 2008. The 100 papers presented in two volumes, together with 5 invited talks, were carefully reviewed and selected from 521 submissions. In addition to the regular papers the volume contains 14 abstracts of papers appearing in full version in the Machine Learning Journal and the Knowledge Discovery and Databases Journal of Springer. The conference intends to provide an international forum for the discussion of the latest high quality research results in all areas related to machine learning and knowledge discovery in databases. The topics addressed are application of machine learning and data mining methods to real-world problems, particularly exploratory research that describes novel learning and mining tasks and applications requiring non-standard techniques.
- Published
- 2008
7. Summarization on the Data Mining Application Research in Chinese Education.
- Author
-
Jiang, Ling, Yang, Zongkai, Liu, Qingtang, and Wei, Haimei
- Abstract
The application of data mining in the field of education is useful, which can help to improve the teaching quality, make scientific management decision. This paper details the data mining application research in Chinese education through relevant literature by retrieval. It analyses the research status in China from several sides as follow, statistical number, the study trend, the specialty background, the research hotspot and research approach. At last, this paper discusses the problems exiting in this research literatures. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
8. Fuzzy Association Rule Mining from Spatio-temporal Data.
- Author
-
Calargun, Seda Unal and Yazici, Adnan
- Abstract
The use of fuzzy sets in mining association rules from spatio-temporal databases is useful since fuzzy sets are able to model the uncertainty embedded in the meaning of data. There are several fuzzy association rule mining techniques that can work on spatio-temporal data. Their ability to mine fuzzy association rules has to be compared on a realistic scenario. Besides the performance criteria, other criteria that can express the quality of an association rule discovered shall be specified. In this paper, fuzzy association rule mining is performed with spatio-temporal data cubes and Apriori algorithm. A real life application is developed to compare data cubes and Apriori algorithm according to the following criteria: interpretability, precision, utility, novelty, direct-to-the-point, performance and visualization, which are defined within the scope of this paper. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
9. Text mining in practice with R.
- Author
-
Kwartler, Ted
- Subjects
R programming ,Telecommunications ,Data mining ,Text processing - Abstract
Summary: A reliable, cost-effective approach to extracting priceless business information from all sources of text Excavating actionable business insights from data is a complex undertaking, and that complexity is magnified by an order of magnitude when the focus is on documents and other text information. This book takes a practical, hands-on approach to teaching you a reliable, cost-effective approach to mining the vast, untold riches buried within all forms of text using R. Author Ted Kwartler clearly describes all of the tools needed to perform text mining and shows you how to use them to identify practical business applications to get your creative text mining efforts started right away. With the help of numerous real-world examples and case studies from industries ranging from healthcare to entertainment to telecommunications, he demonstrates how to execute an array of text mining processes and functions, including sentiment scoring, topic modelling, predictive modelling, extracting clickbait from headlines, and more. You'll learn how to: -Identify actionable social media posts to improve customer service -Use text mining in HR to identify candidate perceptions of an organisation, match job descriptions with resumes, and more -Extract priceless information from virtually all digital and print sources, including the news media, social media sites, PDFs, and even JPEG and GIF image files -Make text mining an integral component of marketing in order to identify brand evangelists, impact customer propensity modelling, and much more Most companies' data mining efforts focus almost exclusively on numerical and categorical data, while text remains a largely untapped resource. Especially in a global marketplace where being first to identify and respond to customer needs and expectations imparts an unbeatable competitive advantage, text represents a source of immense potential value. Unfortunately, there is no reliable, cost-effective technology for extracting analytical insights from the huge and ever-growing volume of text available online and other digital sources, as well as from paper documents-until now.
- Published
- 2017
10. Modeling Decisions for Artificial Intelligence : 12th International Conference, MDAI 2015, Skövde, Sweden, September 21-23, 2015, Proceedings.
- Author
-
Narukawa, Yasuo and Torra, Vicenc
- Subjects
Application software ,Artificial intelligence ,Data mining ,Information storage and retrieval ,Numerical analysis ,Pattern recognition - Abstract
Summary: This book constitutes the proceedings of the 12th International Conference on Modeling Decisions for Artificial Intelligence, MDAI 2015, held in Skövde, Sweden, in September 2015. The 18 revised full papers presented were carefully reviewed and selected from 38 submissions. They discuss theory and tools for modeling decisions, as well as applications that encompass decision making processes and information fusion techniques.
- Published
- 2015
11. A New Credit Scoring Method Based on Rough Sets and Decision Tree.
- Author
-
Zhou, XiYue, Zhang, DeFu, and Jiang, Yi
- Abstract
Credit scoring is a very typical classification problem in Data Mining. Many classification methods have been presented in the literatures to tackle this problem. The decision tree method is a particularly effective method to build a classifier from the sample data. Decision tree classification method has higher prediction accuracy for the problems of classification, and can automatically generate classification rules. However, the original sample data sets used to generate the decision tree classification model often contain many noise or redundant data. These data will have a great impact on the prediction accuracy of the classifier. Therefore, it is necessary and very important to preprocess the original sample data. On this issue, a very effective approach is the rough sets. In rough sets theory, a basic problem that can be tackled using rough sets approach is reduction of redundant attributes. This paper presents a new credit scoring approach based on combination of rough sets theory and decision tree theory. The results of this study indicate that the process of reduction of attribute is very effective and our approach has good performance in terms of prediction accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
12. Unsupervised Classification of Mixed Data Type of Attributes Using Genetic Algorithm (Numeric, Categorical, Ordinal, Binary, Ratio-Scaled).
- Author
-
Rastogi, Rohit, Agarwal, Saumya, Sharma, Palak, Kaul, Uarvarshi, and Jain, Shilpi
- Abstract
Data mining discloses hidden, previously unknown, and potentially useful information from large amounts of data. As comparison to the traditional statistical and machine learning data analysis techniques, data mining emphasizes to provide a convenient and complete environment for the data analysis. Data mining has become a popular technology in analyzing complex data. Clustering is one of the data mining core techniques. In the field of data mining and data clustering, it is a highly desirable task to perform cluster analysis on large data sets with mixed numeric, categorical, ordinal, and ratio-scaled with binary and nominal values. However, most already available data merging and grouping through clustering algorithms are effective for the numeric data rather than the mixed data set. For this purpose, this paper makes efforts to present a new amalgamation algorithm for these mixed data sets by modifying the common cost function, trace of the within cluster dispersion matrix. The genetic algorithm (GA) is used to optimize the new cost function to obtain valid clustering result. We can compare and analyze that the GA-based clustering algorithm is feasible for the high-dimensional data sets with mixed data values that are obtained in real life results.
Core Idea of Our Paper : By this paper, we try to describe a technique for estimating the cost function metrics from mixed numeric, categorical and other type databases by using an uncertain grade-of-membership clustering model with the efficiency of Genetic Algorithm. This technique can be applied to the problem of opportunity analysis for business decision-making. This general approach could be adapted to many other applications where a decision agent needs to assess the value of items from a set of opportunities with respect to a reference set representing its business. For processing numeric attributes, instead of generalizing them, a prototype may be developed for experiments with synthetic and real data sets, and comparison with those of the traditional approaches. The results confirmed the feasibility of the framework and the superiority of the extended techniques. [ABSTRACT FROM AUTHOR]- Published
- 2014
- Full Text
- View/download PDF
13. Exploring Spatio-Temporal Features for Traffic Estimation on Road Networks.
- Author
-
Wei, Ling-Yin, Peng, Wen-Chih, Lin, Chun-Shuo, and Jung, Chen-Hen
- Abstract
In this paper, given a query that indicates a query road segment and a query time, we intend to accurately estimate the traffic status (i.e., the driving speed) on the query road segment at the query time from traffic databases. Note that a traffic behavior in the same time usually reflects similar patterns (referring to the temporal feature), and nearby road segments have the similar traffic behaviors (referring to the spatial feature). By exploring the temporal and spatial features, more GPS data points are retrieved. In light of these GPS data retrieved, we exploit the weighted moving average approach to estimate traffic status on road networks. Experimental results show the effectiveness of our proposed algorithm. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
14. Modeling of User Interest Based on Its Interaction with a Collaborative Knowledge Management System.
- Author
-
Moreno-Llorena, Jaime, Alamán Roldán, Xavier, and Cobos Perez, Ruth
- Abstract
SKC is a prototype system for knowledge management in the Web by means of semantic information without supervision and tries to select the knowledge contained in the system by paying attention to its use. This paper explains user activity analysis in order to find out their interest for knowledge elements in the system, and the application of this interest for users classification and knowledge identification for their interest, inside and outside SKC. As a result a model for user interest based on interaction is obtained. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
15. Improving the Performance of Hierarchical Classification with Swarm Intelligence.
- Author
-
Holden, Nicholas and Freitas, Alex A.
- Abstract
In this paper we propose a new method to improve the performance of hierarchical classification. We use a swarm intelligence algorithm to select the type of classification algorithm to be used at each ˵classifier node″ in a classifier tree. These classifier nodes are used in a top-down divide and conquer fashion to classify the examples from hierarchical data sets. In this paper we propose a swarm intelligence based approach which attempts to mitigate a major drawback with a recently proposed local search-based, greedy algorithm. Our swarm intelligence based approach is able to take into account classifier interactions whereas the greedy algorithm is not. We evaluate our proposed method against the greedy method in four challenging bioinformatics data sets and find that, overall, there is a significant increase in performance. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
16. A Framework for Adaptive and Integrated Classification.
- Author
-
Czarnowski, Ireneusz and Jeżdrzejowicz, Piotr
- Abstract
This paper focuses on classification tasks. The goal of the paper is to propose a framework for adaptive and integrated machine classification and to investigate the effect of different adaptation and integration schemes. After having introduced several integration and adaptation schemes a framework for adaptive and integrated classification in the form of the software shell is proposed. The shell allows for integrating data pre-processing with data mining stages using population-based and A-Team techniques. The approach was validated experimentally. Experiment results have shown that integrated and adaptive classification outperforms traditional approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
17. DynamicWEB: Profile Correlation Using COBWEB.
- Author
-
Scanlan, Joel, Hartnett, Jacky, and Williams, Ray
- Abstract
Establishing relationships within a dataset is one of the core objectives of data mining. In this paper a method of correlating behaviour profiles in a continuous dataset is presented. The profiling problem which motivated the research is intrusion detection. The profiles are dynamic in nature, changing frequently, and are made up of many attributes. The paper describes a modified version of the COBWEB hierarchical conceptual clustering algorithm called DynamicWEB. DynamicWEB operates at runtime, keeping the profiles up to date, and in the correct location within the clustering tree. Further, as there are a number of attributes within the domain of interest, the tree also extends multi-dimensionally. This allows for multiple correlations to occur simultaneously, focusing on different attributes within the one profile. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
18. Integrating Data Mining and Agent Based Modeling and Simulation.
- Author
-
Baqueiro, Omar, Wang, Yanbo J., McBurney, Peter, and Coenen, Frans
- Abstract
In this paper, we introduce an integration study which combines Data Mining (DM) and Agent Based Modeling and Simulation (ABMS). This study, as a new paradigm for DM/ABMS, is concerned with two approaches: (i) applying DM techniques in ABMS investigation, and inversely (ii) utilizing ABMS results in DM research. Detailed description of each approach is presented in this paper. A conclusion and the future work of this (integration) study are given at the end. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
19. Application of Classification Association Rule Mining for Mammalian Mesenchymal Stem Cell Differentiation.
- Author
-
Wang, Weiqi, Wang, Yanbo J., Bañares-Alcántara, René, Cui, Zhanfeng, and Coenen, Frans
- Abstract
In this paper, data mining is used to analyze the differentiation of mammalian Mesenchymal Stem Cells (MSCs). A database comprising the key parameters which, we believe, influence the destiny of mammalian MSCs has been constructed. This paper introduces Classification Association Rule Mining (CARM) as a data mining technique in the domain of tissue engineering and initiates a new promising research field. The experimental results show that the proposed approach performs well with respect to the accuracy of (classification) prediction. Moreover, it was found that some rules mined from the constructed MSC database are meaningful and useful. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
20. Feedforward Neural Network with Multi-valued Connection Weights.
- Author
-
Thammano, Arit and Ruxpakawong, Phongthep
- Abstract
This paper introduces a new concept of the connection weight to the multi-layer feedforward neural network. The architecture of the proposed approach is the same as that of the original multi-layer feedforward neural network. However, the weight of each connection is multi-valued, depending on the value of the input data involved. The backpropagation learning algorithm was also modified to suit the proposed concept. This proposed model has been benchmarked against the original feedforward neural network and the radial basis function network. The results on six benchmark problems are very encouraging. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
21. Agents for Searching Rules in Civil Engineering Data Mining.
- Author
-
Kasperkiewicz, Janusz and Marks, Maria
- Abstract
The software agents are applied for a remote search of information. It seems natural that to analyse such information machine learning routines should be built-in into an agent system. After finding and processing the data the generated rules will be evaluated by means of so called interestingness measures, and only the best rules should be returned to the user. The paper presents situation in civil engineering data processing, as a suggestion for designers of intelligent software tools, to work out difficult but much needed procedures that should be implemented into autonomous agent system, intended for retrieving special kind of information searched for example by materials technologists. A simple architecture for an agent system is suggested without, however, getting into any technical details on how the elements of such system should be constructed. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
22. DynamicWEB: Adapting to Concept Drift and Object Drift in COBWEB.
- Author
-
Scanlan, Joel, Hartnett, Jacky, and Williams, Raymond
- Abstract
Examining concepts that change over time has been an active area of research within data mining. This paper presents a new method that functions in contexts where concept drift is present, while also allowing for modification of the instances themselves as they change over time. This method is well suited to domains where subjects of interest are sampled multiple times, and where they may migrate from one resultant concept to another due to Object Drift. The method presented here is an extensive modification to the conceptual clustering algorithm COBWEB, and is titled DynamicWEB. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
23. Using CBR Systems for Leukemia Classification.
- Author
-
Corchado, Juan M. and De Paz, Juan F.
- Abstract
The continuous advances in genomics, and specifically in the field of transcriptome, require novel computational solutions capable of dealing with great amounts of data. Each expression analysis needs different techniques to explore the data and extract knowledge which allow patients classification. This paper presents a hybrid systems based on Case-based reasoning (CBR) for automatic classification of leukemia patients from Exon array data. The system incorporates novel algorithms for data mining that allow to filter and classify. The system has been tested and the results obtained are presented in this paper. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
24. A Workflow-Based Approach for Creating Complex Web Wrappers.
- Author
-
Montoto, Paula, Pan, Alberto, Raposo, Juan, Losada, José, Bellas, Fernando, and López, Javier
- Abstract
In order to let software programs access and use the information and services provided by web sources, wrapper programs must be built to provide a ˵machine-readable″ view over them. Although research literature on web wrappers is vast, the problem of how to specify the internal logic of complex wrappers in a graphical and simple way remains mainly ignored. In this paper, we propose a new language for addressing this task. Our approach leverages on the existing work on intelligent web data extraction and automatic web navigation as building blocks, and uses a workflow-based approach to specify the wrapper control logic. The features included in the language have been decided from the results of a study of a wide range of real web automation applications from different business areas. In this paper, we also present the most salient results of the study. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
25. A Modified Clustering Method with Fuzzy Ants.
- Author
-
Chen, Jianbin, Fang, Deying, and Xue, Yun
- Abstract
Ant-based clustering due to its flexibility, stigmergic and self-organization has been applied in variety areas from problems arising in commerce, to circuit design, and to text-mining, etc. A modified clustering method with fuzzy ants has been presented in this paper. Firstly, fuzzy ants and its behavior are defined; secondly, the new clustering algorithm has been constructed based on fuzzy ants. In this algorithm, we consider multiple ants based on Schockaert΄s algorithm. This algorithm can be accelerated by the use of parallel ants, global memory banks and density-based `look ahead΄ method. Experimental results show that this algorithm is more efficient to other ant clustering methods. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
26. Outlier Detection Based on Granular Computing.
- Author
-
Chen, Yuming, Miao, Duoqian, and Wang, Ruizhi
- Abstract
As an emerging conceptual and computing paradigm of information processing, granular computing has received much attention recently. Many models and methods of granular computing have been proposed and studied. Among them was the granular computing model using information tables. In this paper, we shall demonstrate the application of this granular computing model for the study of a specific data mining problem - outlier detection. Within the granular computing model using information tables, this paper proposes a novel definition of outliers - GrC (granular computing)-based outliers. An algorithm to find such outliers is also given. And the effectiveness of GrC-based method for outlier detection is demonstrated on three publicly available databases. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
27. Built-In Indicators to Discover Interesting Drill Paths in a Cube.
- Author
-
Cariou, Véronique, Cubillé, Jérôme, Derquenne, Christian, Goutier, Sabine, Guisnel, Françoise, and Klajnmic, Henri
- Abstract
OLAP applications are widely used by business analysts as a decision support tool. While exploring the cube, end-users are rapidly confronted by analyzing a huge number of drill paths according to the different dimensions. Generally, analysts are only interested in a small part of them which corresponds to either high statistical associations between dimensions or atypical cell values. This paper fits in the scope of discovery-driven dynamic exploration. It presents a method coupling OLAP technologies and mining techniques to facilitate the whole process of exploration of the data cube by identifying the most relevant dimensions to expand. At each step of the process, a built-in rank on dimensions is restituted to the users. It is performed through indicators computed on the fly according to the user-defined data selection. A proof of the implementation of this concept on the Oracle 10g system is described at the end of the paper. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
28. Mining Serial Episode Rules with Time Lags over Multiple Data Streams.
- Author
-
Lee, Tung-Ying, Wang, En Tzu, and Chen, Arbee L. P.
- Abstract
The problem of discovering episode rules from static databases has been studied for years due to its wide applications in prediction. In this paper, we make the first attempt to study a special episode rule, named serial episode rule with a time lag in an environment of multiple data streams. This rule can be widely used in different applications, such as traffic monitoring over multiple car passing streams in highways. Mining serial episode rules over the data stream environment is a challenge due to the high data arrival rates and the infinite length of the data streams. In this paper, we propose two methods considering different criteria on space utilization and precision to solve the problem by using a prefix tree to summarize the data streams and then traversing the prefix tree to generate the rules. A series of experiments on real data is performed to evaluate the two methods. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
29. An Artificial Immune System for Evolving Amino Acid Clusters Tailored to Protein Function Prediction.
- Author
-
Secker, A., Davies, M. N., Freitas, A. A., Timmis, J., Clark, E., and Flower, D. R.
- Abstract
This paper addresses the classification task of data mining (a form of supervised learning) in the context of an important bioinformatics problem, namely the prediction of protein functions. This problem is cast as a hierarchical classification problem, where the protein functions to be predicted correspond to classes that are arranged in a hierarchical structure, in the form of a class tree. The main contribution of this paper is to propose a new Artificial Immune System that creates a new representation for proteins, in order to maximize the predictive accuracy of a hierarchical classification algorithm applied to the corresponding protein function prediction problem. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
30. Multiple Criteria Mathematical Programming and Data Mining.
- Author
-
Shi, Yong, Liu, Rong, Yan, Nian, and Chen, Zhenxing
- Abstract
Recently, researchers have extensively applied quadratic programming into classification, known as V. Vapnik΄s Support Vector Machine, as well as various applications. However, using optimization techniques to deal with data separation and data analysis goes back to more than forty years ago. Since 1998, the authors and their colleagues extended such a research idea into classification via multiple criteria linear programming (MCLP) and multiple criteria quadratic programming (MQLP). The purpose of the paper is to share our research results and promote the research interests in the community of computational sciences. These methods are different from statistics, decision tree induction, and neural networks. In this paper, starting from the basics of Multiple Criteria Linear Programming (MCLP), we further discuss penalized MCLP Multiple Criteria Quadratic Programming (MCQP), Multiple Criteria Fuzzy Linear Programming, Multi-Group Multiple Criteria Mathematical Programming, as well as regression method by Multiple Criteria Linear Programming. A brief summary of applications of Multiple Criteria Mathematical Programming is also provided. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
31. CP-Tree: A Tree Structure for Single-Pass Frequent Pattern Mining.
- Author
-
Tanbeer, Syed Khairuzzaman, Ahmed, Chowdhury Farhan, Jeong, Byeong-Soo, and Lee, Young-Koo
- Abstract
FP-growth algorithm using FP-tree has been widely studied for frequent pattern mining because it can give a great performance improvement compared to the candidate generation-and-test paradigm of Apriori. However, it still requires two database scans which are not applicable to processing data streams. In this paper, we present a novel tree structure, called CP-tree (Compact Pattern tree), that captures database information with one scan (Insertion phase) and provides the same mining performance as the FP-growth method (Restructuring phase) by dynamic tree restructuring process. Moreover, CP-tree can give full functionalities for interactive and incremental mining. Extensive experimental results show that the CP-tree is efficient for frequent pattern mining, interactive, and incremental mining with single database scan. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
32. Exploring Big Haystacks.
- Author
-
Pollitt, Mark and Whitledge, Anthony
- Abstract
The proliferation of computer-generated evidence in court proceedings during the last fifteen years has given rise to the new science of digital forensics and a new breed of law enforcement officials, ˵computer forensic examiners,″ who apply the rules of evidence, investigative methods and sophisticated technical skills to analyze digital data for use in court proceedings. This paper explores the technical challenges facing the law enforcement community and discusses the application of data mining and knowledge management techniques to cope with the increasingly massive data sets involved in digital forensic investigations. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
33. An Interpretation of Flow Graphs by Granular Computing.
- Author
-
Sun, Jigui, Liu, Huawen, Qi, Changsong, and Zhang, Huijie
- Abstract
Flow graph (FG) is a unique approach in data mining and data analysis mainly in virtue of its well-structural characteristics of network, which is naturally consistent with granular computing (GrC). Meanwhile, GrC provides us with both structured thinking at the philosophical level and structured problem solving at the practical level. The main objective of the present paper is to develop a simple and more concrete model for flow graph using GrC. At first, FG will be mainly discussed in three aspects under GrC, namely, granulation of FG, some relationships and operations of granules. Moreover, as one of advantages of this interpretation, an efficient approximation reduction algorithm of flow graph is given under the framework of GrC. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
34. Construction of Enhanced Sentiment Sensitive Thesaurus for Cross Domain Sentiment Classification Using Wiktionary.
- Author
-
Sanju, P. and Mirnalinee, T. T.
- Abstract
Sentiment classification is classification of reviews into positive or negative depends on the sentiment words expressed in reviews. Automatic sentiment classification is necessary in various applications such as market analysis, opinion mining, contextual advertisement and opinion summarization. However, sentiments are expressed differently in different domain and annotating label for every domain of interest is expensive and time consuming. In cross domain sentiment classification, a sentiment classifier trained in source domain is applied to classify reviews of target domain, always produce low performance due to the occurrence of features mismatch between source domain and target domain. The proposed method develops solution to feature mismatch problem in cross domain sentiment classification by creating enhanced sentiment sensitive thesaurus using wiktionary. The enhanced sentiment sensitive thesaurus aligns different words in expressing the same sentiment not only from different domains of reviews and from wiktionary to increase the classification performance in target domain. In this paper, the proposed method describes the method of construction of enhanced sentiment sensitive thesaurus which will be useful for cross domain sentiment classification. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
35. Research on Data Mining Technology in IMA Safety Analysis.
- Author
-
Wang, Miao, Zhang, Lihua, Gu, Qingfan, and Wang, Guoqing
- Abstract
This study aims at the following problems brought by avionics system integration: failure spread caused by resource integration, failure implication and chaos resulted from function information fusion, and the difficulty in diagnosing the failure and expansion of failure damages triggered by mission synthesis. In this paper, we analyze the key problems of IMA system safety and take resource integration safety, function information fusion, and mission synthesis as the major objects of study to construct resource integration safety model, function information fusion safety model, and mission synthesis safety model and adopt data mining technology to set up knowledge transmission relationship among layers so as to reach safety management for top-layer mission execution in accordance with resource-layer safety data support. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
36. A Recommendation Method in E-Commerce Based on Product Taxonomy Graph.
- Author
-
Liu, Qian, Wang, Hongzhi, Gao, Hong, Lv, Qi, and Fu, Jianyu
- Subjects
ELECTRONIC commerce ,ECONOMIES of scale ,LATENT semantic analysis ,DIMENSION reduction (Statistics) ,DATA mining - Abstract
The data of e-commerce is growing at a rapid speed. As a result, customers are no longer able to achieve what they want to buy in a relatively short time. Collaborative Filtering (CF) is the most acceptable method about recommendation. However it has two limitations. One is sparsity, the other is scalability. In this paper we give a methodology to solve the problems based on product taxonomy graph. Data mining on product taxonomy graph helps make the transaction data in more aggregated way which is expected to solve the sparsity and scalability problem in CF. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
37. Soft-Constraint Based Online LDA for Community Recommendation.
- Author
-
Kang, Yujie and Yu, Nenghai
- Abstract
With the number of social communities grows, social community recommendation has gradually become a critical technique for users to efficiently find their favorite communities. Currently a variety of recommendation techniques have been developed, such as content-based method, collaborative filtering, etc. There methods either easily overfit the data due to the limitation of observations or suffer the heavy computational cost. Besides, they don΄t consider the relationships between users and communities, and cannot handle incoming users. In this paper, we propose a soft-constraint based online LDA (SO-LDA) method. We use the number of user΄s posts within each community as soft-constraint to estimate the latent topics across the communities by an online LDA algorithm, in which an incremental method is adopted to facilitate model updating when incomes a new user. Experiment on the well-known MySpace community data shows that the proposed method takes much less time and outperforms the state-of-the-art community recommendation methods. [ABSTRACT FROM AUTHOR]
- Published
- 2011
- Full Text
- View/download PDF
38. Enhancing Agent Intelligence through Data Mining: A Power Plant Case Study.
- Author
-
Athanasopoulou, Christina and Chatziathanasiou, Vasilis
- Abstract
In this paper, the methodology for an intelligent assistant for power plants is presented. Multiagent systems technology and data mining techniques are combined to enhance the intelligence of the proposed application, mainly in two aspects: increase the reliability of input data (sensor validation and false measurement replacement) and generate new control monitoring rules. Various classification algorithms are compared. The performance of the application, as tested via simulation experiments, is discussed. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
39. A Self-Organized Multiagent System for Intrusion Detection.
- Author
-
Palomo, Esteban J., Domínguez, Enrique, Luque, Rafael M., and Muñoz, Jose
- Abstract
This paper describes a multiagent system with capabilities to analyze and discover knowledge gathered from distributed agents. These enhanced capabilities are obtained through a dynamic self-organizing map and a multiagent communication system. The central administrator agent dynamically obtains information about the attacks or intrusions from the distributed agents and maintains a knowledge pool using a proposed growing self-organizing map. The approach integrates traditional mathematical and data mining techniques with a multiagent system. The proposed system is used to build an intrusion detection system (IDS) as a network security application. Finally, experimental results are presented to confirm the good performance of the proposed system. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
40. Compression-Based Measures for Mining Interesting Rules.
- Author
-
Suzuki, Einoshin
- Abstract
An interestingness measure estimates the degree of interestingness of a discovered pattern and has been actively studied in the past two decades. Several pitfalls should be avoided in the study such as a use of many parameters and a lack of systematic evaluation in the presence of noise. Compression-based measures have advantages in this respect as they are typically parameter-free and robust to noise. In this paper, we present J-measure and a measure based on an extension of the Minimum Description Length Principle (MDLP) as compression-based measures for mining interesting rules. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
41. Adaptive Visual Clustering for Mixed-Initiative Information Structuring.
- Author
-
Duman, Hakan, Healing, Alex, and Ghanea-Hercock, Robert
- Abstract
Cyclone is a mixed-initiative and adaptive clustering and structure generation environment which is capable of learning categorization behavior through user interaction as well as conducting auto-categorization based on the extracted model. The strength of Cyclone resides in its integration of several visualization and interface techniques with data mining and AI learning processes. This paper presents the intuitive visual interface of Cyclone which empowers the user to explore, analyze, exploit and structure unstructured information from various sources generating a personalized taxonomy in real-time and on-the-fly. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
42. Frost Prediction Characteristics and Classification Using Computational Neural Networks.
- Author
-
Sallis, Philip, Jarur, Mary, and Trujillo, Marcelo
- Abstract
The effect of frost on the successful growth and quality of crops is well understood by growers as leading potentially to total harvest failure. Studying the frost phenomenon, especially in order to predict its occurrence has been the focus of numerous research projects and investigations. Frost prone areas are of particular concern. Grape growing for wine production is a specific area of viticulture and agricultural research. This paper describes the problem, outlines a wider project that is gathering climate and atmospheric data, together with soil, and plant data in order to determine the inter-dependencies of variable values that both inform enhanced crop management practices and where possible, predict optimal growing conditions. The application of some novel data mining techniques together with the use of computational neural networks as a means to modeling and then predicting frost is the focus of the investigation described here as part of the wider project. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
43. Customer΄s Relationship Segmentation Driving the Predictive Modeling for Bad Debt Events.
- Author
-
Pinheiro, Carlos Andre Reis and Helfert, Markus
- Abstract
This paper covers a comparison between two distinct approaches to neural network modeling. The first one is based on a developing of a single neural network model to predict bad debt events. The second one is based on combined models, building firstly a clustering model to recognize the pattern assigned to the customers, with a particular focus on the insolvency, and then developing several distinct neural networks to predict bad debt. In the second approach, for each group identified by the clustering model one neural network had been constructed. In that way, we turned the quite heterogeneous customer base more homogeneous, increasing the average accuracy for the predictive modeling once several straightforward models were built. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
44. A Design for Library Marketing System and Its Possible Applications.
- Author
-
Minami, Toshiro
- Abstract
Library marketing system is a system that helps with improving patrons΄ convenience and library management based on the data in libraries by analyzing them with data mining methods, including statistical ones. In this paper we present a design of such a system which deals with usage data of materials and extracts knowledge and tips that are useful, for example, for better arrangements of bookshelves and for providing patrons with information which will attract the patrons. Two methods are proposed for collecting usage data from bookshelves; one is with RFID and the other is with two-dimensional code, such as the QR code that is very popularly used in mobile phones. By combining several analysis methods, we can construct a library marketing system, which will give benefits to library management and patron services. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
45. An Efficient Candidate Pruning Technique for High Utility Pattern Mining.
- Author
-
Ahmed, Chowdhury Farhan, Tanbeer, Syed Khairuzzaman, Jeong, Byeong-Soo, and Lee, Young-Koo
- Abstract
High utility pattern mining extracts more useful and realistic knowledge from transaction databases compared to the traditional frequent pattern mining by considering the non-binary frequency values of items in transactions and different profit values for every item. However, the existing high utility pattern mining algorithms suffer from the level-wise candidate generation-and-test problem and need several database scans to mine the actual high utility patterns. In this paper, we propose a novel tree-based candidate pruning technique HUC-Prune (high utility candidates prune) to efficiently mine high utility patterns without level-wise candidate generation-and-test. It exploits a pattern growth mining approach and needs maximum three database scans in contrast to several database scans of the existing algorithms. Extensive experimental results show that our technique is very efficient for high utility pattern mining and it outperforms the existing algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
46. A Corpus-Based Approach for Automatic Thai Unknown Word Recognition using Ensemble Learning Techniques.
- Author
-
TeCho, Jakkrit, Nattee, Cholwich, and Theeramunkong, Thanaruk
- Abstract
This paper presents a corpus-based approach for automatic unknown word recognition in Thai. This approach applies an ensemble learning technique to generate a model for classifying unknown word candidates using features obtained from a corpus. We propose a technique called ˵group-based evaluation by ranking″. It clusters the unknown word candidates into groups based on the occuring locations. The candidate with the highest accuracy is then identified as an unknown word. In this task, the number of positive instances is dominantly smaller than that of negative instances, forming an unbalanced data set. To improve the prediction accuracy, we apply a boosting technique with ˵voting under group-based evaluation by ranking″. We have conducted experiments on real-world data to evaluate the performance of the proposed approach. The experiments compared the accuracy of our technique with an ordinary naïve Bayes technique. Our technique achieves the accuracy 90.93±0.50% when the first rank is selected and 97.90±0.26% when the candidates up to the tenth rank are considered. This is 6.79% to 8.45% improvement. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
47. Interactive Abnormal Condition Sign Discovery for Hydroelectric Power Plants.
- Author
-
Ito, Norihiko, Onoda, Takashi, and Yamasaki, Hironobu
- Abstract
Kyushu Electric Power Co.,Inc. collects various sensor data and weather information to maintain hydroelectric power plants while the plants are running. However, it is very rare to occur abnormal and trouble condition data in power equipments. And in order to collect the abnormal and trouble condition data, it is hard to construct an experimental hydroelectric power plant. Because its cost is very high. In this situation, we have to find abnormal condition data as a risk management. In this paper, we consider that the abnormal condition sign may be unusual condition data. This paper shows results of unusual condition data of bearing vibration detected from the collected various sensor data and weather information by using one class support vector machine. The result shows that our approach may be useful for unusual condition data detection and maintaining hydroelectric power plants. Therefore, the proposed method is one of risk management for hydroelectric power plants. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
48. Agent-Based In-Store Simulator for Analyzing Customer Behaviors in a Super-Market.
- Author
-
Terano, Takao, Kishimoto, Ariyuki, Takahashi, Toru, Yamada, Takashi, and Takahashi, Masakazu
- Abstract
This paper presents an agent-based simulator to investigating customer walking flows and purchasing behaviors in a super market. So far, such investigations have cost very much to examine in real situations. The simulator enables us to carry out ˵virtual experiments″ through changing various parameters of retail businesses and store operations. For this purpose, first we observe an actual retail store and analyze sales data. Then we develop the simulation model: Agent-Based In-Store Simulator (ABISS). Intensive experiments have revealed that the flow of customers, which is related to the sales, depends on the design of a store and that the places of in-store advertisement and recommendation system vary their sales. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
49. SlopeMiner: An Improved Method for Mining Subtle Signals in Time Course Microarray Data.
- Author
-
McCormick, Kevin, Shrivastava, Roli, and Liao, Li
- Abstract
This paper presents an improved method, SlopeMiner, for analyzing time course microarray data by identifying genes that undergo gradual transitions in expression level. The algorithm calculates the slope for the slow transition between the expression levels of data, matching the sequence of expression level for each gene against temporal patterns having one transition between two expression levels. The method, when used along with StepMiner -an existing method for extracting binary signals, significantly increases the annotation accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
50. A Novel Algorithm for Associative Classification.
- Author
-
Kundu, Gourab, Munir, Sirajum, Bari, Md. Faizul, Islam, Md. Monirul, and Murase, Kazuyuki
- Abstract
Associative classifiers have been the subject of intense research for the last few years. Experiments have shown that they generally result in higher accuracy than decision tree classifiers. In this paper, we introduce a novel algorithm for associative classification ˵Classification based on Association Rules Generated in a Bidirectional Apporach″ (CARGBA). It generates rules in two steps. At first, it generates a set of high confidence rules of smaller length with support pruning and then augments this set with some high confidence rules of higher length with support below minimum support. Experiments on 6 datasets show that our approach achieves better accuracy than other state-of-the-art associative classification algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.