39 results on '"Sebastián Ventura"'
Search Results
2. Efficient Frequent Chronicle Mining Algorithms: Application to Sleep Disorder
- Author
-
Hareth Zmezm, Jose Maria Luna, Eduardo Almeda, and Sebastian Ventura
- Subjects
Frequent event graphs ,chronicle mining ,sequence mining ,temporal data mining ,sleep disorder ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Sequential pattern mining is a dynamic and thriving research field that aims to extract recurring sequences of events from complex datasets. Traditionally, focusing solely on the order of events often falls short of providing precise insights. Consequently, incorporating the temporal intervals between events has emerged as a vital necessity across various domains, e.g. medicine. Analyzing temporal event sequences within patients’ clinical histories, drug prescriptions, and monitoring alarms exemplifies this critical need. This paper presents innovative and efficient methodologies for mining frequent chronicles from temporal data. The mined graphs offer a significantly more expressive representation than mere event sequences, capturing intricate details of a series of events in a factual manner. The experimental stage includes a series of analyses of diverse databases with distinct characteristics. The proposed approaches were also applied to real-world data comprising information about subjects suffering from sleep disorders. Alluring frequent complete event graphs were obtained on patients who were under the effect of sleep medication.
- Published
- 2024
- Full Text
- View/download PDF
3. Tree-Shaped Ensemble of Multi-Label Classifiers using Grammar-Guided Genetic Programming
- Author
-
Eva Gibaja, Jose M. Moyano, Krzysztof J. Cios, and Sebastián Ventura
- Subjects
Grammar ,Genetic programming algorithm ,business.industry ,media_common.quotation_subject ,Genetic programming ,Pattern recognition ,02 engineering and technology ,Ensemble learning ,Prediction algorithms ,ComputingMethodologies_PATTERNRECOGNITION ,Tree shaped ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Classifier (UML) ,media_common - Abstract
Multi-label classification paradigm has had a growing interest because of the emergence of a large number of classification problems where each of the instances of the data can be associated with several output labels simultaneously. Several ensemble methods were proposed to solve the multilabel classification problem. However, most of them simply create diversity in the ensemble by following a random procedure and give the same importance to all members. In this paper, we propose a Grammar-Guided Genetic Programming algorithm to build ensembles of multi-label classifiers. Given a pool of multilabel classifiers, each of them modeling dependencies among a subset of k labels, they are combined into a tree-shaped ensemble. At each node of the tree, predictions of its children nodes are combined, while each leaf represents a classifier from the pool. We propose two configurations for the method: using a fixed value of k for all classifiers in the pool, or using a variable value of k for each classifier, thus being able to capture relationships among groups of labels of different size in the ensemble. The experiments performed over sixteen multi-label dataset and using five evaluation metrics demonstrated that our method performs significantly better than the state-of-the-art ensembles of multilabel classifiers.
- Published
- 2020
4. A Preliminary Study on Evolutionary Clustering for Multiple Instance Learning
- Author
-
Sebastián Ventura, Amelia Zafra, and Aurora Esteban
- Subjects
Computer science ,business.industry ,Feature vector ,Evolutionary algorithm ,02 engineering and technology ,Object (computer science) ,Machine learning ,computer.software_genre ,Set (abstract data type) ,Evolutionary clustering ,020204 information systems ,Genetic algorithm ,0202 electrical engineering, electronic engineering, information engineering ,Task analysis ,020201 artificial intelligence & image processing ,Artificial intelligence ,Representation (mathematics) ,business ,Cluster analysis ,computer - Abstract
Since its beginnings, multiple instance learning studies have shown an excellent performance in the areas where it has been applied. This efficiency is due to multiple instance learning allows to represent a complex object by a set of feature vectors, being a more flexible representation to preserve more information than one based on single feature vector. This paper attempts to progress in this area carrying out a first study that introduces evolutionary algorithms for solving multiple instance cluster analysis. Specifically, we present four proposals of genetic algorithms for multi-instance partitional clustering: three of them are adaptations of existing algorithms for single-instance clustering, while the last one is a novel approach based on CHC evolutionary algorithm. Moreover, two classic non-genetic partitional algorithms are included in the final comparison. Experimental results considering ten representative datasets show promising results for our proposal.
- Published
- 2020
5. MiNerDoc: a Semantically Enriched Text Mining System to Transform Clinical Text into Knowledge
- Author
-
Sebastián Ventura, José María Luna, and Carmen Luque
- Subjects
0303 health sciences ,Computer science ,business.industry ,Unified Medical Language System ,Feature extraction ,02 engineering and technology ,computer.file_format ,computer.software_genre ,Enriched text ,03 medical and health sciences ,Text mining ,Unified Modeling Language ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Task analysis ,Diagnosis code ,Artificial intelligence ,business ,computer ,Natural language processing ,030304 developmental biology ,Coding (social sciences) ,computer.programming_language - Abstract
Existing systems to support the daily decisiontaking process carried out by health professionals need to be used independently to perform different text mining subtasks. In practice, there are few systems that unify all the subtasks into an unique framework, easing therefore the clinical work by automating complex clinical tasks such as the detection of clinical alerts as well as clinical information coding. In this sense, the MiNerDoc system is proposed, whose main objective is to support clinical decision-taking process by analysing tons of textual clinical reports in an unified framework. MiNerDoc performs two basic functions that are of great importance in the medical field: detection of risk factors based on the recognition of five medical entities (Disease, Pharmacologic, Region/Part Body, Procedure/Test, Finding/Sign), and automatic prediction of standardized diagnostic codes (MeSH descriptors). A major feature of MiNerDoc is it includes external knowledge sources such as MetaMap and UMLS to terminologically and semantically enrich the interpretation of clinical texts. Some study cases are considered in this work to demonstrate the power of MiNerDoc.
- Published
- 2019
6. Obtaining Tractable and Interpretable Descriptions for Cases with Complications from a Colorectal Cancer Database
- Author
-
Jose-Antonio Delgado-Osuna, Sebastián Ventura, Carlos García-Martínez, and Jose Gomez Barbadillo
- Subjects
education.field_of_study ,Association rule learning ,Database ,business.industry ,Colorectal cancer ,Population ,Cancer ,02 engineering and technology ,medicine.disease ,computer.software_genre ,University hospital ,030218 nuclear medicine & medical imaging ,03 medical and health sciences ,0302 clinical medicine ,0202 electrical engineering, electronic engineering, information engineering ,Medicine ,020201 artificial intelligence & image processing ,business ,education ,computer - Abstract
Colorectal cancer affects to a significant portion of the population and is one of the leading causes of cancer-related deaths in many countries. Professionals of the Reina Sofia University Hospital have fed a database about this pathology for more than 10 years. In this work, we apply classification and association rule learning tools, including a new methodology, to obtain tractable and interpretable descriptions of those cases where complications appeared, which is one of the attributes.
- Published
- 2019
7. A Supervised Methodology for Analyzing Dysregulation in Splicing Machinery: An Application in Cancer Diagnosis
- Author
-
Sebastián Ventura, Raúl M. Luque, Justo P. Castaño, and Oscar Reyes
- Subjects
0301 basic medicine ,03 medical and health sciences ,030104 developmental biology ,0302 clinical medicine ,Computer science ,030220 oncology & carcinogenesis ,RNA splicing ,Molecular targets ,Genomics ,Computational biology ,Classifier (UML) ,Weighting - Abstract
Deregulated splicing factors have shown to be associated with the development of several types of cancer and, therefore, the determination of such alterations can help the development of tumor-specific molecular targets for early prognosis and therapy. Determining the relevant splicing factors, however, is not a straightforward task mainly due to the heterogeneity of tumors and the variability across samples. In this work, a methodology based on supervised machine learning methods is proposed, allowing the determination of subsets of relevant factors that best discriminate samples. The methodology comprises three main phases: first, a ranking of splicing factors is determined by means of applying feature weighting algorithms; second, the best subset of factors that allows the induction of an accurate classifier is detected; then the confidence over the induced classifier is assessed by means of explaining the individual predictions. Finally, the utility and benefit of the proposed methodology are illustrated by means of analyzing a small dataset of neuroendocrine lung carcinoids, and the results showed that there exist small subsets of deregulated factors which can effectively distinguish between tumor samples and their respective adjacent non-tumor tissues.
- Published
- 2019
8. Discovering Students’ Engagement Behaviors in Confidence-based Assessment
- Author
-
Rabia Maqsood, Paolo Ceravolo, and Sebastián Ventura
- Subjects
Qualitative analysis ,Engineering education ,education ,Task analysis ,Mathematics education ,Disengagement theory ,Cluster analysis ,Psychology - Abstract
Considering the usefulness of monitoring students’ response to available task-level feedback in confidence-based assessment, in this paper, we introduce a novel approach to classify students problem-solving activities into various engagement and disengagement behaviors and study their occurrences during complete learning sessions. Then by clustering these sessions, we obtained four distinct groups which varied both in terms of students’ (dis)engagement behaviors and their quantitative performance scores in confidence-based assessment. Moreover, a qualitative analysis shows that high and low performance students (determined based on their final scores in the course) relate differently to the obtained clusters. Based on these findings we highlight that our approach of investigating students’ engagement by observing traces of performed problem-solving activities is promising and opens new avenues of research. Also, our approach is more generic as it does not contain human-expert defined time limits which are usually determined by analyzing students’ data who participated in the experimental study.
- Published
- 2019
9. Large-Scale Multi-label Ensemble Learning on Spark
- Author
-
Sebastián Ventura, Jorge Gonzalez-Lopez, and Alberto Cano
- Subjects
Speedup ,Computational complexity theory ,Distributed database ,Computer science ,business.industry ,02 engineering and technology ,Machine learning ,computer.software_genre ,Ensemble learning ,Instruction set ,ComputingMethodologies_PATTERNRECOGNITION ,020204 information systems ,Spark (mathematics) ,Scalability ,0202 electrical engineering, electronic engineering, information engineering ,Benchmark (computing) ,020201 artificial intelligence & image processing ,Artificial intelligence ,Data mining ,business ,computer - Abstract
Multi-label learning is a challenging problem which has received growing attention in the research community over the last years. Hence, there is a growing demand of effective and scalable multi-label learning methods for larger datasets both in terms of number of instances and numbers of output labels. The use of ensemble classifiers is a popular approach for improving multi-label model accuracy, especially for datasets with high-dimensional label spaces. However, the increasing computational complexity of the algorithms in such ever-growing high-dimensional label spaces, requires new approaches to manage data effectively and efficiently in distributed computing environments. Spark is a framework based on MapReduce, a distributed programming model that offers a robust paradigm to handle large-scale datasets in a cluster of nodes. This paper focuses on multi-label ensembles and proposes a number of implementations through the use of parallel and distributed computing using Spark. Additionally, five different implementations are proposed and the impact on the performance of the ensemble is analyzed. The experimental study shows the benefits of using distributed implementations over the traditional single-node single-thread execution, in terms of performance over multiple metrics as well as significant speedup tested on 29 benchmark datasets.
- Published
- 2017
10. Multi-view semi-supervised learning using genetic programming interpretable classification rules
- Author
-
Sebastián Ventura and Carlos García-Martínez
- Subjects
business.industry ,Computer science ,020206 networking & telecommunications ,Context (language use) ,Genetic programming ,02 engineering and technology ,Semi-supervised learning ,Extension (predicate logic) ,Machine learning ,computer.software_genre ,Kernel (statistics) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer - Abstract
Multi-view learning is a novel paradigm that aims at obtaining better results by examining the information from several perspectives instead of by analysing the same information from a single viewpoint. The multi-view methodology has widely been used for semi-supervised learning, where just some patterns were previously classified by an expert and there is a large amount of unlabelled ones. However to our knowledge, the multi-view learning paradigm has not been applied to produce interpretable rule-based classifiers before. In this work, we present a multi-view extension of a grammar-based genetic programming model for inducing rules for semi-supervised contexts. Its idea is to evolve several populations, and their corresponding views, favouring both the accuracy of the predictions for the labelled patterns and the prediction agreement with the other views for unlabelled ones. We have carried out experiments with two to five views, on six common datasets for fully-supervised learning that have been partially anonymised for our semi-supervised study. Our results show that the multi-view paradigm allows to obtain slightly better rule-based classifiers, and that two views becomes preferred.
- Published
- 2017
11. On the effect of local search in the multi-objective evolutionary discovery of software architectures
- Author
-
Sebastián Ventura, José Raúl Romero, and Aurora Ramírez
- Subjects
Computer science ,business.industry ,Search-based software engineering ,Evolutionary algorithm ,020207 software engineering ,02 engineering and technology ,Machine learning ,computer.software_genre ,Data science ,Evolutionary computation ,Business process discovery ,Software ,Software construction ,Component-based software engineering ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Local search (optimization) ,Artificial intelligence ,business ,Software architecture ,computer ,Software design description - Abstract
Software architects devote substantial efforts to find the most fitting architectural description for their system, which should not only specify its structure, but is also required to meet multiple, simultaneous quality criteria. Evolutionary computation has recently demonstrated to provide insightful support during the design phase by automatically deciding how to organise internal software components and how they should interact each other. Observed from a multi-objective perspective, particular care has to be taken in order to reach an appropriate trade-off among design metrics, while providing the software engineer with diverse alternatives to choose among. However, multi-objective evolutionary algorithms may find difficulties to control both aspects and, at the same time, to explore the entire search space in depth. Under these circumstances, local search can be applied to complement the evolution by scrutinising the most promising search directions. This paper proposes two different approaches that take advantage of the benefits of local search within the multi-objective evolutionary discovery of component-based software architectures. A detailed analysis and comparative study provides interesting findings like the importance of assigning a sufficient number of evaluations to the local improvement. The way in which local search explores and compares solutions for acceptance is a relevant aspect to promote diversity during the discovery process as well.
- Published
- 2017
12. An evolutionary algorithm for mining rare association rules: A Big Data approach
- Author
-
José María Luna, Sebastián Ventura, and Francisco Padillo
- Subjects
Fitness function ,Association rule learning ,business.industry ,Computer science ,Big data ,Evolutionary algorithm ,02 engineering and technology ,Machine learning ,computer.software_genre ,Field (computer science) ,020204 information systems ,Spark (mathematics) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Algorithm design ,Artificial intelligence ,Data mining ,business ,computer - Abstract
Association rule mining is one of the most wellknown techniques to discover interesting relations between items in data. To date, this task has been mainly focused on the discovery of frequent relationships. However, it is often interesting to focus on those that do not occur frequently. Rare association rule mining is an alluring field aiming at describing rare cases or unexpected behavior. This field is really useful over Big Data where abnormal endeavor are more curious than common behavior. In this sense, our aim is to propose a new evolutionary algorithm based on grammars to obtain rare association rules on Big Data. The novelty of our work is that it is eminently designed to be parallel, enabling its use over emerging technologies as Spark and Flink. Furthermore, while other algorithms focus on maximizing a couple of quality measure ignoring the rest, our fitness function has been precisely designed to obtain a trade-off while maximizing a set of well-known quality measures. The experimental study includes more than 70 datasets revealing alluring results in efficiency when more than 300 million of instances and file sizes up to 250 GBytes are considered, and proving that it is able to run efficiently in huge volumes of data.
- Published
- 2017
13. An evolutionary algorithm for optimizing the target ordering in Ensemble of Regressor Chains
- Author
-
Eva Gibaja, Sebastián Ventura, and Jose M. Moyano
- Subjects
0209 industrial biotechnology ,business.industry ,Evolutionary algorithm ,Pattern recognition ,macromolecular substances ,02 engineering and technology ,Evolutionary computation ,Regression ,Correlation ,020901 industrial engineering & automation ,Chain (algebraic topology) ,0202 electrical engineering, electronic engineering, information engineering ,Feature (machine learning) ,020201 artificial intelligence & image processing ,Artificial intelligence ,Regression algorithm ,business ,Selection (genetic algorithm) ,Mathematics - Abstract
In this article we present an evolutionary algorithm for the optimization of sequences of targets for the multi-target regression algorithm Ensemble of Regressor Chains. This algorithm selects several random sequences or chains of targets where to predict each target, the values of previous targets in the chain are included as features, considering in this way the relationship among them. Under the assumption that a target may be better predicted if it is highly correlated with the targets which were included as feature, our proposal, called CCO-ERC, looks for chains where each target is highly correlated with previous targets in the chain. Several methods for the combination of predictions in the ensemble and for the selection of the chains which forms the ensemble are also proposed. CCO-ERC is compared to other state-of-the-art algorithms in multi-target regression, presenting statistically better performance than them.
- Published
- 2017
14. Subgroup discovery on big data: Pruning the search space on exhaustive search algorithms
- Author
-
Francisco Padillo, Sebastián Ventura, and José María Luna
- Subjects
Computer science ,business.industry ,Big data ,Brute-force search ,02 engineering and technology ,Data structure ,Machine learning ,computer.software_genre ,Set (abstract data type) ,020204 information systems ,Spark (mathematics) ,0202 electrical engineering, electronic engineering, information engineering ,Unsupervised learning ,020201 artificial intelligence & image processing ,Algorithm design ,Pruning (decision trees) ,Artificial intelligence ,Data mining ,business ,Algorithm ,computer - Abstract
Subgroup Discovery is a broadly applicable supervised local pattern mining method to search relations between different properties with respect to a target variable. With the exponential growth in data storage, the massive data gathered has hampered the performance of current techniques. In this regard, our aim is to propose two new algorithms to discover subgroups on Big Data by using MapReduce. Apache Spark was used to tackle the Big Data requirements. The experimental study includes more than 50 large datasets and a set of efficient algorithms. Search spaces bigger than 1.276 · 1015 subgroups are used. The experimental study reveals the alluring results in efficiency when optimistic estimates are considered, as well as demonstrating the usefulness of using Apache Spark to tackle Big Data.
- Published
- 2016
15. Subgroup Discovery on Big Data: Exhaustive Methodologies Using Map-Reduce
- Author
-
Sebastián Ventura, Francisco Padillo, and José María Luna
- Subjects
business.industry ,Computer science ,Property (programming) ,Big data ,02 engineering and technology ,computer.software_genre ,Field (computer science) ,Set (abstract data type) ,020204 information systems ,Face (geometry) ,Computer data storage ,Spark (mathematics) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Algorithm design ,Data mining ,business ,computer - Abstract
Subgroup Discovery is a flexible supervised local pattern mining method whose aim is to discover interesting subgroups with respect to one property of interest. Although many efficient algorithms have been developed in this field, the growing interest in data storage has provoked that the datasets are larger and larger hampering their performance. In this paper, two new algorithms to discover subgroups on Big Data have been proposed. In this regard, the MapReduce paradigm has been considered and in concrete Apache Spark was used to face up the Big Data requirements. The experimental study considers more than 40 high dimensional datasets and a set of efficient algorithms on the subgroup discovery field. Search spaces bigger than 3.3 10^13 available subgroups are used. The experimental analysis demonstrates that the proposed algorithms obtain excellent results in efficiency, demonstrating the usefulness of using Apache Spark in the field.
- Published
- 2016
16. Facing Up Fare War: Generating Competitive Price Models With Gene Expression Programming
- Author
-
Marco Antonio Barron, Jose Maria Luna, and Sebastian Ventura
- Subjects
Gene expression ,airline fare war ,classification ,recommender systems ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
In the airline industry, the Revenue and Pricing teams generally spend a considerable amount of time analysing and interpreting the actions of their competitors. Most of the time the analysts have to use their analytical skills to create ad-hoc methods to interpret or find patterns in the fares. In this field, it is key to automate the process, avoid human errors, and add new features that provide accurate fares. Considering this, a gene expression programming algorithm is proposed to carry out this process, returning an interpretable rule set which acts as a recommender system to ease the daunting process done by the pricing teams manually. The proposal was applied to a real scenario with the information provided by the Air Canada airline for five months in Canadian markets. The experimental analysis revealed, by means of non-parametric statistical tests, that the proposed gene expression programming algorithm was key to getting the appropriate features and, hence, accurate and highly interpretable results. The proposal obtained extremely accurate results (around 96% in both accuracy and F1 measure) with a reduction of around 50% in the rule set in many cases.
- Published
- 2022
- Full Text
- View/download PDF
17. Reducing the Label Space a Predefined Ratio for a More Efficient Multilabel Classification
- Author
-
Jose M. Moyano, Jose M. Luna, and Sebastian Ventura
- Subjects
Algorithm efficiency ,binary classification ,dimensionality reduction ,label space reduction ,multi-label classification ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
The multi-label classification task has been widely used to solve problems where each of the instances may be related not only to one class but to many of them simultaneously. Many of these problems usually comprise a high number of labels in the output space, so learning a predictive model from such datasets may turn into a challenging task since the computational complexity of most algorithms depends on the number of labels. In this paper, we propose a methodology to reduce the label space a user predefined ratio of labels, aiming to improve the runtime of the multi-label classification algorithms. Obviously, such reduction should be done without producing a significant drop in their final predictive performance. The experimental analysis carried out over 25 well-known multi-label datasets, demonstrates a drastic reduction in the runtime. Besides, it is statistically proven that reducing 20% the number of labels does not lead to a decrease in the predictive performance of the multi-label algorithms using four well-known evaluation measures. Even more, in many cases, although reductions of up to 50% of the output space are made, the predictive performance of the algorithms is not significantly different from using the whole set of labels.
- Published
- 2022
- Full Text
- View/download PDF
18. On the use of ant programming for mining rare association rules
- Author
-
José Raúl Romero, Sebastián Ventura, and Juan Luis Olmo
- Subjects
Optimization problem ,Association rule learning ,business.industry ,Computer science ,Context-free grammar ,Machine learning ,computer.software_genre ,Field (computer science) ,Inductive programming ,Domain (software engineering) ,Set (abstract data type) ,Artificial intelligence ,Data mining ,Automatic programming ,business ,computer - Abstract
Most researches in association rule mining have focused on the extraction of frequent and reliable associations. However, there is an increasing interest in finding reliable rules that rarely appear, and recently, some classical solutions have been adapted to this field. The problem is that most of these algorithms follow an exhaustive approach, which have the drawback of becoming unfeasible when dealing with high complex data sets. This kind of problem can be also addressed as an optimization problem, for which bio-inspired algorithms have proved their ability. To this end, this paper presents an ant-based automatic programming method for discovering rare association rules. This algorithm lacks the drawbacks of exhaustive approaches, having also some advantages, such as the employment of a context-free grammar that allows to adapt the algorithm to a particular domain. Results show that this proposal can mine a set of reliable infrequent rules in a short period of time.
- Published
- 2013
19. Binary and multiclass imbalanced classification using multi-objective ant programming
- Author
-
Juan Luis Olmo, José Raúl Romero, Alberto Cano, and Sebastián Ventura
- Subjects
Programming algorithm ,Computer science ,business.industry ,Binary number ,computer.software_genre ,Machine learning ,Imbalanced data ,Domain (software engineering) ,Task (project management) ,Multiclass classification ,ComputingMethodologies_PATTERNRECOGNITION ,Rule-based machine translation ,Data mining ,Artificial intelligence ,business ,computer - Abstract
Classification in imbalanced domains is a challenging task, since most of its real domain applications present skewed distributions of data. However, there are still some open issues in this kind of problem. This paper presents a multi-objective grammar-based ant programming algorithm for imbalanced classification, capable of addressing this task from both the binary and multiclass sides, unlike most of the solutions presented so far. We carry out two experimental studies comparing our algorithm against binary and multiclass solutions, demonstrating that it achieves an excellent performance for both binary and multiclass imbalanced data sets.
- Published
- 2012
20. VisualJCLEC: A visual framework for evolutionary computation
- Author
-
José Raúl Romero, Juan Ignacio Jaen, and Sebastián Ventura
- Subjects
Java Evolutionary Computation Toolkit ,Theoretical computer science ,Human-based evolutionary computation ,Computer science ,Knapsack problem ,Scalability ,Memetic algorithm ,Interactive evolutionary computation ,Evolutionary programming ,Evolutionary computation - Abstract
This paper presents VisualJCLEC, a visual framework based on JCLEC for Evolutionary Computing. In order to have a high degree of adaptability, the architecture and pattern design followed are focused on enhancing the f exibility and scalability. For illustrative purposes, a case study of an optimization classical problem (the knapsack problem) using this framework is presented, as well as some guidelines on how to add new elements to the environment by means of CDL descriptors.
- Published
- 2012
21. Learning similarity metric to improve the performance of lazy multi-label ranking algorithms
- Author
-
Carlos Morell, Sebastián Ventura, and Oscar Reyes
- Subjects
Computer science ,business.industry ,Heuristic ,Machine learning ,computer.software_genre ,Weighting ,k-nearest neighbors algorithm ,Similarity (network science) ,Metric (mathematics) ,Feature (machine learning) ,Learning to rank ,Instance-based learning ,Artificial intelligence ,Data mining ,business ,computer - Abstract
The definition of similarity metrics is one of the most important tasks in the development of nearest neighbours and instance based learning methods. Furthermore, the performance of lazy algorithms can be significantly improved with the use of an appropriate weight vector. In the last years, the learning from multi-label data has attracted significant attention from a lot of researchers, motivated from an increasing number of modern applications that contain this type of data. This paper presents a new method for feature weighting, defining a similarity metric as heuristic to estimate the feature weights, and improving the performance of lazy multi-label ranking algorithms. The experimental stage shows the effectiveness of our proposal.
- Published
- 2012
22. A genetic programming free-parameter algorithm for mining association rules
- Author
-
Sebastián Ventura, José María Luna, José Raúl Romero, and Cristóbal Romero
- Subjects
Association rule learning ,Computer science ,Encoding (memory) ,Feature (machine learning) ,Evolutionary algorithm ,Brute-force search ,Genetic programming ,Data mining ,Genetic representation ,Context-free grammar ,computer.software_genre ,computer ,Algorithm - Abstract
This paper presents a free-parameter grammar-guided genetic programming algorithm for mining association rules. This algorithm uses a contex-free grammar to represent individuals, encoding the solutions in a tree-shape conformant to the grammar, so they are more expressive and flexible. The algorithm here presented has the advantages of using evolutionary algorithms for mining association rules, and it also solves the problem of tuning the huge number of parameters required by these algorithms. The main feature of this algorithm is the small number of parameters required, providing the possibility of discovering association rules in an easy way for non-expert users. We compare our approach to existing evolutionary and exhaustive search algorithms, obtaining important results and overcoming the drawbacks of both exhaustive search and evolutionary algorithms. The experimental stage reveals that this approach discovers frequent and reliable rules without a parameter tuning.
- Published
- 2012
23. An EP algorithm for learning highly interpretable classifiers
- Author
-
Amelia Zafra, Sebastián Ventura, and Alberto Cano
- Subjects
Computer science ,business.industry ,Rule mining ,Machine learning ,computer.software_genre ,Evolutionary computation ,Knowledge-based systems ,Artificial intelligence ,business ,computer ,Classifier (UML) ,Algorithm ,Evolutionary programming ,Interpretability - Abstract
This paper introduces an Evolutionary Programming algorithm for solving classification problems using highly interpretable IF-THEN classification rules. It is an algorithm aimed to maximize the comprehensibility of the classifier by minimizing the number of rules and employing only relevant attributes. The proposal is evaluated and compared to other 5 well-known classification techniques over 18 datasets. The results obtained from the experiments show its competitive accuracy and the significantly better interpretability of the classifiers provided in terms of number of rules, number of conditions and a complexity metric.
- Published
- 2011
24. Association rule mining using a multi-objective grammar-based ant programming algorithm
- Author
-
Sebastián Ventura, José María Luna, José Raúl Romero, and Juan Luis Olmo
- Subjects
Grammar ,Association rule learning ,Computer science ,business.industry ,media_common.quotation_subject ,Association (object-oriented programming) ,Pareto principle ,Brute-force search ,Genetic programming ,Context-free grammar ,Machine learning ,computer.software_genre ,Ranking ,Artificial intelligence ,business ,computer ,media_common - Abstract
This paper presents a method for extracting association rules by means of a multi-objective grammar guided ant programming algorithm. Solution construction is guided by a context-free grammar specifically suited for association rule mining, which defines the search space of all possible expressions or programs. Evaluation of individuals is considered from a Pareto-based point of view, measuring support and confidence of rules mined, and assigning them a ranking fitness. The proposed algorithm is verified over 10 varied data sets and compared to other association rule mining algorithms from several paradigms such as exhaustive search, genetic algorithms and genetic programming, showing that ant programming is a good technique at addressing the association task of data mining as well.
- Published
- 2011
25. Mining and representing rare association rules through the use of genetic programming
- Author
-
José Raúl Romero, Sebastián Ventura, and José María Luna
- Subjects
Association rule learning ,Grammar ,business.industry ,Computer science ,media_common.quotation_subject ,Brute-force search ,Genetic programming ,Context-free grammar ,computer.software_genre ,Machine learning ,Field (computer science) ,Domain (software engineering) ,Set (abstract data type) ,Artificial intelligence ,Data mining ,business ,computer ,media_common - Abstract
Whereas the extraction of frequent patterns has focused the major researches in association rule mining, the requirements of reliable rules that do not frequently appear is taking an increasing interest in a great number of areas. This field has not been explored in depth and most algorithms for mining infrequent association rules follow an exhaustive search methodology, which hampers the extracting process because of the size of the datasets. The importance of discovering patterns that do not frequently appear in a dataset and the promising results obtained when using evolutionary proposals in the field of frequent pattern mining motivates the evolutionary proposal for discovering rare association rules presented in this paper. Here, a context-free grammar is described and applied to adapt individuals to each particular problem or domain. The use of both an evolutionary approach and a context-free grammar reduces the memory requirements and provides the possibility of extracting any kind of rules, respectively. The experimental study shows that this proposal obtains a set of reliable infrequent rules in a short period of time.
- Published
- 2011
26. Subgroup discovery in an e-learning usage study based on Moodle
- Author
-
M. J. del Jesus, Sebastián Ventura, Cristóbal J. Carmona, and Pedro González
- Subjects
Computer science ,business.industry ,E-learning (theory) ,Fuzzy control system ,Machine learning ,computer.software_genre ,Electronic learning ,Evolutionary computation ,Algorithm design ,The Internet ,Artificial intelligence ,Computer aided instruction ,business ,computer - Abstract
This paper presents an experimental study with several subgroup discovery algorithms using data from a web-based education system. The main objective of this contribution is to extract unusual subgroups to describe possible relationships between the use of the e-learning platform and marks obtained by the students. The results obtained by the best performing algorithm, NMEEF-SD, are also presented. Finally, the most representative results obtained by this algorithm are analised, in order to obtain knowledge that can allow teachers to take actions to improve student performance.
- Published
- 2011
27. Self-evaluation first ECTS course in a programming subject
- Author
-
Eva Gibaja, María Dolores Rubio Luque, Amelia Zafra, and Sebastián Ventura
- Subjects
Further education ,Multimedia ,Computer science ,Process (engineering) ,Teaching method ,Subject (documents) ,computer.software_genre ,Unit (housing) ,Course (navigation) ,Self evaluation ,ComputingMilieux_COMPUTERSANDEDUCATION ,Mathematics education ,Set (psychology) ,computer - Abstract
This paper presents our experience in a programming course unit during its first year of EHEA. The course unit features described are the students' profile, teaching methodology and assessment criteria. The virtualisation process and the self-evaluation carried out are presented, concluding our analysis with a set of discussions and recommendations to improve our next teaching course.
- Published
- 2011
28. An evaluation of the effectiveness of e-learning system as support for traditional classes
- Author
-
Eva Gibaja, Amelia Zafra, Sebastián Ventura, and María Dolores Rubio Luque
- Subjects
Face-to-face ,Multimedia ,Computer science ,Process (engineering) ,E-learning (theory) ,Face (sociological concept) ,Virtual learning environment ,Subject (documents) ,computer.software_genre ,Virtualization ,computer ,Telecommunications network - Abstract
Virtual learning environments (VLE) offer a continuous learning system where information, resources and experiences are always available. Currently, these systems are widely used as a support for face to face classes. In this sense, they make easier the communication with students and maintain activities and resources for the subject. This paper presents the design and development of a subject using an e-learning platform, analyses the number of hits by the students, and evaluates the students' performance. First, virtualization is presented using the Moodle platform and developing a learning model based on the development of cooperative and collaborative activities. Subsequently, a study on the impact of using this tool in the learning process is carried out.
- Published
- 2011
29. Feature selection is the ReliefF for multiple instance learning
- Author
-
Mykola Pechenizkiy, Sebastián Ventura, Amelia Zafra, and Data Mining
- Subjects
Computer science ,business.industry ,Dimensionality reduction ,Supervised learning ,Pattern recognition ,Feature selection ,Filter (signal processing) ,Machine learning ,computer.software_genre ,Set (abstract data type) ,Kernel (linear algebra) ,Algorithm design ,Artificial intelligence ,business ,computer ,Curse of dimensionality - Abstract
Dimensionality reduction and feature selection in particular are known to be of a great help for making supervised learning more effective and efficient. Many different feature selection techniques have been proposed for the traditional settings, where each instance is expected to have a label. In multiple instance learning (MIL) each example or bag consists of a variable set of instances, and the label is known for the bag as a whole, but not for the individual instances it consists of. Therefore, utilizing class labels for feature selection in MIL is not that straightforward and traditional approaches for feature selection are not directly applicable. This paper proposes a filter feature selection approach based on the ReliefF technique. It allows any previously designed MIL method to benefit from our feature selection approach, which helps to cope with the curse of dimensionality. Experimental results show the effectiveness of the proposed approach in MIL — different MIL algorithms tend to perform better when applied after the dimensionality reduction.
- Published
- 2010
30. A TDIDT technique for multi-label classification
- Author
-
Sebastián Ventura, Manuel Victoriano, José Luis Ávila-Jiménez, and Eva Gibaja
- Subjects
Multi-label classification ,Computer science ,business.industry ,Decision tree ,Pattern recognition ,Machine learning ,computer.software_genre ,Ensemble learning ,Statistical classification ,ComputingMethodologies_PATTERNRECOGNITION ,C4.5 algorithm ,Entropy (information theory) ,Algorithm design ,Artificial intelligence ,business ,computer - Abstract
There are numerous problems of increasing significance where a pattern can have several classes simultaneously associated. This kind of problems, usually called multi-label problems, should be tackled with specific techniques in order to generate models more accurate than those obtained with classical classification algorithms. This work presents the adaptation of the J48 algorithm to multi-label classification. The developed algorithm allows the generation of interpretable models and has been tested over several datasets and experiments show that it has a performance which is similar to other multi-label tree-based approaches being specially suitable to be used as base-classifier in an ensemble.
- Published
- 2010
31. An intruder detection approach based on infrequent rating pattern mining
- Author
-
Aurora Ramírez, Sebastián Ventura, José María Luna, and José Raúl Romero
- Subjects
Association rule learning ,Computer science ,business.industry ,Evolutionary algorithm ,Intelligent decision support system ,Genetic programming ,Context-free grammar ,Recommender system ,Machine learning ,computer.software_genre ,Proof of concept ,Scalability ,Artificial intelligence ,Data mining ,business ,computer - Abstract
This work presents a novel proposal for incremental intruder detection in collaborative recommender systems. We explore the use of rare association rule mining to reveal the existence of a suspected raid of attackers that would alter the normal behaviour of a rating-based system. In this position paper we have extended our previous G3PARM algorithm, which has already proven to serve as a solid method for extracting frequent association rules. G3PARM is an evolutionary algorithm that uses G3P (Grammar Guided Genetic Programming), which provides expressiveness and flexibility enough to adapt and apply the base context-free grammar to each specific problem or domain. We fully outline, moreover, the complete exploration and detection model, which includes some further post-analysis steps. Finally, as a proof of concept, we validate the scalability, efficiency and accuracy of our proposal showing the results obtained when different malicious intruders want to attack an on line recommender system.
- Published
- 2010
32. A grammar based Ant Programming algorithm for mining classification rules
- Author
-
Sebastián Ventura, Juan Luis Olmo, and José Raúl Romero
- Subjects
Grammar ,Programming algorithm ,Heuristic ,Computer science ,business.industry ,media_common.quotation_subject ,Context-free grammar ,computer.software_genre ,Machine learning ,Artificial intelligence ,Data mining ,Automatic programming ,business ,Classifier (UML) ,computer ,media_common - Abstract
This paper focuses on the application of a new ACO-based automatic programming algorithm to the classification task of data mining. This new model, called GBAP algorithm, is based on a context-free grammar that properly guides the creation of new valid individuals. Moreover, its most differentiating factors, such as the use of two complementary heuristic measures for every transition rule, as well as the way it assigns a consequent and evaluates the extracted rules, are also discussed. These features enhance the final rule compilation from the output classifier. The performance of the proposed algorithm is evaluated and compared against other top algorithms, and the results obtained over 17 diverse data sets show that our approach reaches pretty competitive and even better accuracy values than those resulting from the other algorithms considered in the experimentation.
- Published
- 2010
33. G3PARM: A Grammar Guided Genetic Programming algorithm for mining association rules
- Author
-
Sebastián Ventura, José María Luna, and José Raúl Romero
- Subjects
education.field_of_study ,Theoretical computer science ,Weighted Majority Algorithm ,Association rule learning ,business.industry ,Cultural algorithm ,Computer science ,Population-based incremental learning ,Population ,Pareto principle ,Evolutionary algorithm ,Genetic programming ,Machine learning ,computer.software_genre ,Hybrid algorithm ,Evolutionary computation ,GSP Algorithm ,Genetic algorithm ,In-place algorithm ,Artificial intelligence ,education ,business ,computer ,Evolutionary programming ,FSA-Red Algorithm - Abstract
This paper presents the G3PARM algorithm for mining representative association rules. G3PARM is an evolutionary algorithm that uses G3P (Grammar Guided Genetic Programming) and an auxiliary population made up of its best individuals who will then act as parents for the next generation. Due to the nature of G3P, the G3PARM algorithm allows us to obtain valid individuals by defining them through a context-free grammar and, furthermore, this algorithm is generic with respect to data type. We compare our algorithm to two multiobjective algorithms frequently used in literature and known as NSGA2 (Non dominated Sort Genetic Algorithm) and SPEA2 (Strength Pareto Evolutionary Algorithm) and demonstrate the efficiency of our algorithm in terms of running-time, coverage and average support, providing the user with high representative rules.
- Published
- 2010
34. Evolutionary algorithms for subgroup discovery applied to e-learning data
- Author
-
Sebastián Ventura, Pedro González, M. J. del Jesus, Cristóbal J. Carmona, and Cristóbal Romero
- Subjects
Association rule learning ,Computer science ,business.industry ,Evolutionary algorithm ,Application software ,computer.software_genre ,Machine learning ,Educational data mining ,Evolutionary computation ,Statistical classification ,Knowledge extraction ,Learning Management ,Artificial intelligence ,business ,computer - Abstract
This work presents the application of subgroup discovery techniques to e-learning data from learning management systems (LMS) of andalusian universities. The objective is to extract rules describing relationships between the use of the different activities and modules available in the e-learning platform and the final mark obtained by the students. For this purpose, the results of different classical and evolutionary subgroup discovery algorithms are compared, showing the adequacy of the evolutionary algorithms to solve this problem. Some of the rules obtained are analyzed with the aim of extract knowledge allowing the teachers to take actions to improve the performance of their students.
- Published
- 2010
35. Multiple Instance Learning with MultiObjective Genetic Programming for Web Mining
- Author
-
Amelia Zafra, Sebastián Ventura, and Eva Gibaja
- Subjects
Statistical classification ,ComputingMethodologies_PATTERNRECOGNITION ,Knowledge extraction ,Web mining ,Computer science ,business.industry ,Population-based incremental learning ,Scalability ,Web page ,Genetic programming ,Algorithm design ,Artificial intelligence ,business - Abstract
This paper introduces a multiobjective grammar based genetic programming algorithm to solve a Web Mining problem from multiple instance perspective. This algorithm, called MOG3P-MI, is evaluated and compared with other available algorithms which extend a well-known neighborhood-based algorithm (k-nearest neighbour algorithm) and with a mono objective version of grammar guided genetic programming G3P-MI. Computational experiments show that, the MOG3PMI algorithm obtains the best results, solves problems of k-nearest neighbour algorithms, such as sparsity and scalability, adds comprehensibility and clarity in the knowledge discovery process and overcomes the results of monoobjective version.
- Published
- 2008
36. Exceptional in so Many Ways—Discovering Descriptors That Display Exceptional Behavior on Contrasting Scenarios
- Author
-
Jose Maria Luna, Mykola Pechenizkiy, Wouter Duivesteijn, and Sebastian Ventura
- Subjects
Exceptional model mining ,exceptional patterns ,supervised descriptive pattern mining ,rank correlation ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
The current state of the art in supervised descriptive pattern mining is very good in automatically finding subsets of the dataset at hand that are exceptional in some sense. The most common form, subgroup discovery, generally finds subgroups where a single target variable has an unusual distribution. Exceptional model mining (EMM) typically finds subgroups where a pair of target variables display an unusual interaction. What these methods have in common is that one specific exceptionality is enough to flag up a subgroup as exceptional. This, however, naturally leads to the question: can we also find multiple instances of exceptional behaviour simultaneously in the same subgroup? This paper provides a first, affirmative answer to that question in the form of the SPEC (Subsets of Pairwise Exceptional Correlations) model class for EMM. Given a set of predefined numeric target variables, SPEC will flag up subgroups as interesting if multiple target pairs display an unusual rank correlation. This is a fundamental extension of the EMM toolbox, which comes with additional algorithmic challenges. To address these challenges, we provide a series of algorithmic solutions whose strengths/flaws are empirically analysed.
- Published
- 2020
- Full Text
- View/download PDF
37. Extracting User-Centric Knowledge on Two Different Spaces: Concepts and Records
- Author
-
Jose Maria Luna, Philippe Fournier-Viger, and Sebastian Ventura
- Subjects
Pattern mining ,space of concepts ,space of records ,user-centric knowledge ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
The growing demand for eliciting useful knowledge from data calls for techniques that can discover insights (in the form of patterns) that users need. Methodologies for describing intrinsic and relevant properties of data through the extraction of useful patterns, however, work on fixed input data, and the data representation, therefore, constrains the discovered insights. In this regard, this paper aims at providing foundations to make the descriptive knowledge that is extracted by pattern mining more user-centric by relying on flexible data structures defined on two different perspectives: concepts and data records. In this sense, items in data can be grouped into abstract terms through subjective hierarchies of concepts, whereas data records can also be organized based on the users' subjective perspective. A series of easy-to-follow toy examples are considered for each of the two perspectives to demonstrate the usefulness and necessity of the proposed foundations in pattern mining. Finally, aiming at experimentally testing whether classical pattern mining algorithms can be adapted to such flexible data structures, the experimental analysis comprises different methodologies, including exhaustive search, random search, and evolutionary approaches. All these approaches are based on well-known and widely recognized techniques to demonstrate the usefulness of the provided foundations for future research works and more efficient and specifically designed algorithms. Obtained insights demonstrate the importance of working with subjectivity: an item is a type of soda but belongs to a pack, including two or more soda types.
- Published
- 2020
- Full Text
- View/download PDF
38. 12th International Conference on Intelligent Systems Design and Applications, ISDA 2012, Kochi, India, November 27-29, 2012
- Author
-
Ajith Abraham, Albert Y. Zomaya, Sebastián Ventura, Ronald R. Yager, Václav Snásel, Azah Kamilah Muda, and Philip Samuel
- Published
- 2012
39. 11th International Conference on Intelligent Systems Design and Applications, ISDA 2011, Córdoba, Spain, November 22-24, 2011
- Author
-
Sebastián Ventura, Ajith Abraham, Krzysztof J. Cios, Cristóbal Romero 0001, Francesco Marcelloni, José Manuel Benítez, and Eva Lucrecia Gibaja Galindo
- Published
- 2011
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.