63 results on '"Andrzej Janusz"'
Search Results
2. A Practical Study of Methods for Deriving Insightful Attribute Importance Rankings Using Decision Bireducts
- Author
-
Andrzej Janusz, Dominik Ślęzak, Sebastian Stawicki, and Krzysztof Stencel
- Published
- 2023
3. BrightBox — A rough set based technology for diagnosing mistakes of machine learning models
- Author
-
Andrzej Janusz, Andżelika Zalewska, Łukasz Wawrowski, Piotr Biczyk, Jan Ludziejewski, Marek Sikora, and Dominik Ślęzak
- Subjects
Software - Published
- 2023
4. IEEE BigData Cup 2022 Report Privacy-preserving Matching of Encrypted Images
- Author
-
Marcin Szczuka, Andrzej Janusz, Boguslaw Cyganek, Jakub Grabek, Lukasz Przebinda, Andzelika Zalewska, Andrzej Bukala, and Dominik Slezak
- Published
- 2022
5. EVEAL - Expected Variance Estimation for Active Learning
- Author
-
Daniel Kaluza, Andrzej Janusz, and Dominik Slezak
- Published
- 2022
6. KnowledgePit Meets BrightBox: A Step Toward Insightful Investigation of the Results of Data Science Competitions
- Author
-
Andrzej Janusz and Dominik Ślęzak
- Published
- 2022
7. Prescriptive Analytics for Optimization of FMCG Delivery Plans
- Author
-
Marek Grzegorowski, Andrzej Janusz, Stanisław Łażewski, Maciej Świechowski, and Monika Jankowska
- Published
- 2022
8. Data-Driven Resilient Supply Management Supported by Demand Forecasting
- Author
-
Marek Grzegorowski, Andrzej Janusz, Jarosław Litwin, and Łukasz Marcinowski
- Published
- 2022
9. Predicting Victories in Video Games - IEEE BigData 2021 Cup Report
- Author
-
Maciej Matraszek, Andrzej Janusz, Maciej Swiechowski, and Dominik Slezak
- Published
- 2021
10. Thermally stimulated processes in low density polyethylene
- Author
-
Markiewicz., Andrzej Janusz
- Subjects
ComputingMilieux_COMPUTERSANDEDUCATION ,ComputerApplications_COMPUTERSINOTHERSYSTEMS ,Uncategorized - Abstract
This thesis was scanned from the print manuscript for digital preservation and is copyright the author. Researchers can access this thesis by asking their local university, institution or public library to make a request on their behalf. Monash staff and postgraduate students can use the link in the References field.
- Published
- 2021
- Full Text
- View/download PDF
11. Predicting Escalations in Customer Support: Analysis of Data Mining Challenge Results
- Author
-
Daniel Kaluza, Guohua Hao, Dominik Slezak, Andrzej Janusz, Robert Wojciechowski, and Tony Li
- Subjects
Event (computing) ,Computer science ,business.industry ,media_common.quotation_subject ,Frame (networking) ,Big data ,02 engineering and technology ,010501 environmental sciences ,01 natural sciences ,Data science ,Data modeling ,Software ,0202 electrical engineering, electronic engineering, information engineering ,Task analysis ,Data analysis ,020201 artificial intelligence & image processing ,Quality (business) ,business ,0105 earth and related environmental sciences ,media_common - Abstract
We summarize IEEE Big Data Cup: Predicting Escalations in Customer Support – a data mining competition organized jointly by companies Information Builders and QED Software at the KnowledgePit platform, in the frame of the 2020 IEEE International Conference on Big Data. We discuss the motivation for organizing this event and highlight the factors that make it such a challenging topic. We describe the data provided to participants and formulate the competition task. We also provide an overview of competition results with a detailed analysis of a few selected solutions. Finally, we present a novel functionality of the KnowledgePit platform – an analytic module that allows organizers to investigate selected solutions using a convenient GUI and provides in-depth insights about their quality.
- Published
- 2020
12. Network Device Workload Prediction: A Data Mining Challenge at Knowledge Pit
- Author
-
Andrzej Janusz, Piotr Biczyk, Mateusz Przyborowski, and Dominik Slezak
- Subjects
Computer science ,Association (object-oriented programming) ,020208 electrical & electronic engineering ,02 engineering and technology ,Workload prediction ,computer.software_genre ,Networking hardware ,Competition (economics) ,Upload ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Data mining ,Baseline (configuration management) ,computer - Abstract
We describe the 7th edition of the international data mining competition held at Knowledge Pit in association with the FedCSIS conference series. The goal was to predict workload-related characteristics of monitored network devices. We analyze solutions uploaded by the most successful participants. We investigate prediction errors which had the greatest influence on their results. We also present our own baseline solution which turned out to be the most reliable in the final evaluation.
- Published
- 2020
13. On Positive-Correlation-Promoting Reducts
- Author
-
Joanna Henzel, Andrzej Janusz, Marek Sikora, and Dominik Ślęzak
- Subjects
Discrete mathematics ,0209 industrial biotechnology ,Rule induction ,Binary number ,Feature selection ,02 engineering and technology ,Positive correlation ,R package ,020901 industrial engineering & automation ,Knowledge extraction ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Rough set ,Row ,Mathematics - Abstract
We introduce a new rough-set-inspired binary feature selection framework, whereby it is preferred to choose attributes which let us distinguish between objects (cases, rows, examples) having different decision values according to the following mechanism: for objects u1 and u2 with decision values \(dec(u1)=0\) and \(dec(u2)=1\), it is preferred to select attributes a such that \(a(u1)=0\) and \(a(u2)=1\), with the secondary option – if the first one is impossible – to select a such that \(a(u1)=1\) and \(a(u2)=0\). We discuss the background for this approach, originally inspired by the needs of the genetic data analysis. We show how to derive the sets of such attributes – called positive-correlation-promoting reducts (PCP reducts in short) – using standard calculations over appropriately modified rough-set-based discernibility matrices. The proposed framework is implemented within the RoughSets R package which is widely used for the data exploration and knowledge discovery purposes.
- Published
- 2020
14. IEEE BigData 2019 Cup: Suspicious Network Event Recognition
- Author
-
Agnieszka Chadzynska-Krasowska, Daniel Kaluza, Dominik Slezak, Joel Holland, Bartek Konarski, and Andrzej Janusz
- Subjects
Feature engineering ,Competition (economics) ,Recurrent neural network ,Software ,business.industry ,Analytics ,Computer science ,Big data ,Baseline (configuration management) ,business ,Data science ,Scope (computer science) - Abstract
“IEEE BigData 2019 Cup: Suspicious Network Event Recognition” was a data mining competition organized jointly by companies Security On-Demand and QED Software at the KnowledgePit online platform, in association with the IEEE BigData 2019 conference. The scope of this challenge referred to the notions of cybersecurity analytics and network alert evaluation. In this paper, we summarize the results of our competition. We explain how data sets had been prepared before it was possible to make them available to competition participants. We describe the baseline scoring models that we designed as a reference for participants, and we demonstrate how critical for their performance was the aspect of appropriate feature engineering. We also discuss the results of experiments conducted to verify the (un)suitability of deep recurrent neural networks in this particular case. In some sense, we show that there are no “perfect” machine learning approaches that could be applied equally successfully to every data science undertaking.
- Published
- 2019
15. A framework for learning and embedding multi-sensor forecasting models into a decision support system: A case study of methane concentration in coal mines
- Author
-
Marek Sikora, Dominik Ślęzak, Andrzej Janusz, Łukasz Wróbel, Marek Grzegorowski, Michał Kozielski, Sebastian Stawicki, and Sinh Hoa Nguyen
- Subjects
Decision support system ,Information Systems and Management ,Computer science ,business.industry ,Feature extraction ,Coal mining ,02 engineering and technology ,computer.software_genre ,Computer Science Applications ,Theoretical Computer Science ,Task (project management) ,Data set ,Artificial Intelligence ,Control and Systems Engineering ,Feature (computer vision) ,Analytics ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Embedding ,020201 artificial intelligence & image processing ,Data mining ,business ,computer ,Software - Abstract
We introduce a new approach for learning forecasting models over large multi-sensor data sets, including the steps of sliding-window-based feature extraction and rough-set-inspired feature subset ensemble selection. We show how to integrate this approach with the major data-processing-related components of DISESOR – a decision support system which is a coherent and complete framework for exploring streams of sensor readings registered in underground coal mines. As a case study, we report our experiments related to the task of methane concentration forecasting. The contributions in this paper refer to both the analysis how the nature of sensor readings influenced the architecture of the developed system and the empirical proof that the designed methods for data processing and analytics turned out to be efficient in practice.
- Published
- 2018
16. Cost Optimization for Big Data Workloads Based on Dynamic Scheduling and Cluster-Size Tuning
- Author
-
Andrzej Janusz, Cas Apanowicz, Petre Lameski, Marek Grzegorowski, Eftim Zdravevski, and Dominik Ślęzak
- Subjects
Schedule ,Data processing ,Information Systems and Management ,business.industry ,Computer science ,Distributed computing ,Big data ,Spot market ,Cloud computing ,02 engineering and technology ,Dynamic priority scheduling ,Total cost of ownership ,Computer Science Applications ,Management Information Systems ,020204 information systems ,Scalability ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,business ,Information Systems - Abstract
Analytical data processing has become the cornerstone of today's businesses success, and it is facilitated by Big Data platforms that offer virtually limitless scalability. However, minimizing the total cost of ownership (TCO) for the infrastructure can be challenging. We propose a novel method to build resilient clusters on cloud resources that are fine-tuned to the particular data processing task. The presented architecture follows the infrastructure-as-a-code paradigm so that the cluster can be dynamically configured and managed. It first identifies the optimal cluster size to perform a job in the required time. Then, by analyzing spot instance price history and using ARIMA models, it optimizes the schedule of the job execution to leverage the discounted prices of the cloud spot market. In particular, we evaluated savings opportunities when using Amazon EC2 spot instances comparing to on-demand resources. The performed experiments confirmed that the prediction module significantly improved the cost-effectiveness of the solution – up to 80% savings compared to the on-demand prices, and at the worst-case, 1% more cost than the absolute minimum. The production deployments of the architecture show that it is invaluable for minimizing the total cost of ownership of analytical data processing solutions.
- Published
- 2021
17. Decision bireducts and decision reducts – a comparison
- Author
-
Dominik Ślęzak, Sebastian Widz, Andrzej Janusz, and Sebastian Stawicki
- Subjects
Computational complexity theory ,business.industry ,Applied Mathematics ,010102 general mathematics ,Decision tree ,Evidential reasoning approach ,02 engineering and technology ,Decision rule ,Machine learning ,computer.software_genre ,01 natural sciences ,Theoretical Computer Science ,Artificial Intelligence ,Simple (abstract algebra) ,0202 electrical engineering, electronic engineering, information engineering ,Influence diagram ,020201 artificial intelligence & image processing ,Artificial intelligence ,0101 mathematics ,business ,Completeness (statistics) ,computer ,Software ,Optimal decision ,Mathematics - Abstract
In this paper we revise the notion of decision bireducts. We show new interpretations and we prove several important and practically useful facts regarding this notion. We also explain the way in which some of the well-known algorithms for computation of decision reducts can be modified for the purpose of computing decision bireducts. For the sake of completeness of our study we extend our investigations to relations between decision bireducts and so-called approximate decision reducts. We compare different formulations of those two approaches and draw analogies between them. We also report new results related to NP-hardness of searching for optimal decision bireducts and approximate decision reducts from data. Finally, we present new results of empirical tests which demonstrate usefulness of decision bireducts in a construction of efficient, yet simple ensembles of classification models.
- Published
- 2017
18. Clash Royale Challenge: How to Select Training Decks for Win-rate Prediction
- Author
-
Andrzej Janusz, Marek Grzegorowski, and Lukasz Grad
- Subjects
Scope (project management) ,Computer science ,Association (object-oriented programming) ,02 engineering and technology ,010501 environmental sciences ,Collision ,01 natural sciences ,Data science ,Training (civil) ,Task (project management) ,Competition (economics) ,020204 information systems ,Active learning ,0202 electrical engineering, electronic engineering, information engineering ,Information system ,0105 earth and related environmental sciences - Abstract
We summarize the sixth data mining competition organized at the Knowledge Pit platform in association with the Federated Conference on Computer Science and Information Systems series, titled Clash Royale Challenge: How to Select Training Decks for Win-rate Prediction. We outline the scope of this challenge and briefly present its results. We also discuss the problem of acquiring knowledge about new notions from video games through an active learning cycle. We explain how this task is related to the problem considered in the challenge and share results of experiments that we conducted to demonstrate usefulness of the active learning approach in practice.
- Published
- 2019
19. Analytics over Multi-sensor Time Series Data – A Case-Study on Prediction of Mining Hazards
- Author
-
Andrzej Janusz and Dominik Ślęzak
- Subjects
Decision support system ,business.industry ,Computer science ,Dimensionality reduction ,computer.software_genre ,Interchangeability ,Task (project management) ,Subject-matter expert ,Analytics ,Data mining ,Time series ,business ,computer ,Predictive modelling - Abstract
Mining of high-dimensional time series data that represent readings of multiple sensors is a challenging task. We focus on several important aspects of analytics over such data. We describe a methodology for extracting informative features from multidimensional data streams, as well as algorithms for finding compact representations of such data, in order to facilitate the construction of prediction models. We pay special attention to designing new approaches to dimensionality reduction and interchangeability of features that such representations comprise of. We validate our algorithms on data sets obtained from coal mines and we demonstrate how their results can be applied for a construction of a decision support system. We show that such system is efficient and that its outcomes can be easily interpreted by subject matter experts.
- Published
- 2019
20. SENSEI: An Intelligent Advisory System for the eSport Community and Casual Players
- Author
-
Andrzej Janusz, Krzysztof Stencel, Sebastian Stawicki, and Dominik Slezak
- Subjects
Advisory system ,Casual ,Analytics ,business.industry ,Computer science ,Human–computer interaction ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,02 engineering and technology ,business ,Conceptual architecture ,Advice (programming) - Abstract
In this article, we describe the SENSEI system. It helps players to improve their skills in popular eSports games. We discuss the main goals of the system and explain the associated challenges. We also present its conceptual architecture which aims at enabling full automation of the data acquisition and analytic processes. The system is expected to provide in-depth analytics of players' performance and give practical advice regarding possible improvements. Thus its architecture allows players to provide feedback and manually label important concepts. Finally, we discuss our first case study - an advisory system for popular collectible card video games.
- Published
- 2018
21. Toward Machine Learning on Granulated Data – a Case of Compact Autoencoder-based Representations of Satellite Images
- Author
-
Tomasz Tajmajer, Dominik Slezak, Andrzej Janusz, Piotr Biczyk, Mateusz Przyborowski, and Lukasz Grad
- Subjects
030110 physiology ,0301 basic medicine ,Computer science ,business.industry ,Deep learning ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Cognitive neuroscience of visual object recognition ,Pattern recognition ,030229 sport sciences ,Autoencoder ,Image (mathematics) ,03 medical and health sciences ,0302 clinical medicine ,Satellite ,Artificial intelligence ,business ,Image retrieval - Abstract
We consider a problem of learning from compact representations of images for a purpose of object recognition and content-based image retrieval. We discuss a motivation for using compressed images in those tasks and indicate exemplary applications related to analysis on the data from satellites. Finally, we show some preliminary results of experiments conducted to demonstrate the impact of the image data granulation on the quality of classification. We empirically compare the performance of prediction models trained on original images, images compressed using autoencoders, and on images whose quality was lowered in order to reduce their size.
- Published
- 2018
22. Utilizing Hybrid Information Sources to Learn Representations of Cards in Collectible Card Video Games
- Author
-
Andrzej Janusz, Dominik Slezak, and Lukasz Grad
- Subjects
Information retrieval ,Computer science ,business.industry ,Latent semantic analysis ,02 engineering and technology ,Interchangeability ,Task (project management) ,020204 information systems ,Similarity (psychology) ,0202 electrical engineering, electronic engineering, information engineering ,Task analysis ,020201 artificial intelligence & image processing ,Word2vec ,The Internet ,business - Abstract
We investigate the problem of learning representations of cards in collectible card video games. Our goal is to utilize such representations in modeling contextual similarity between cards. When constructed appropriately, such similarity models can offer many benefits to players. In particular, one can employ them to recommend cheaper or more available card replacements in popular decks. To this end, we utilize some known NLP methods, such as word2vec and Latent Semantic Analysis, to extract card embeddings from their base characteristics and textual descriptions. We also propose two new approaches that make use of information regarding multiple decks constructed by the community of players and attempt to capture the notion of card interchangeability. We empirically validate the described methods and compare their performance using data obtained for two popular games, Hearthstone: Heroes of Warcraft and Clash Royale. In the experiments, we consider various representations of cards and then, we derive the corresponding similarities. To validate the compared methods, we check how consistent are the similarity measurements, which they produce with the assessments made by experienced players. Results of our analysis show that combining outcomes of methods that work with different sources of information, i.e., textual descriptions of individual cards and deck-specific card co-occurrences, can improve performance in the task of similarity assessment. Moreover, a clustering of cards in the constructed vector space can provide some interesting insights for the community of players. As already mentioned, it can be used to suggest replacements of cards that players lack in their collections or to indicate cards that are likely to deteriorate win chances of particular decks.
- Published
- 2018
23. Investigating Similarity between Hearthstone Cards: Text Embeddings and Interchangeability Approaches
- Author
-
Dominik Slezak and Andrzej Janusz
- Subjects
Information retrieval ,Computer science ,020204 information systems ,Similarity (psychology) ,ComputingMilieux_PERSONALCOMPUTING ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Word2vec ,02 engineering and technology ,Space (commercial competition) ,Video game ,Interchangeability - Abstract
We investigate similarities between cards and decks from the video game Hearthstone: Heroes of Warcraft. We utilize some NLP methods, such as word2vec and LSA, to learn card representations from their descriptions. We also attempt to quantify interchangeability of cards in Hearthstone decks. We experimentally validate the presented methods and compare their performance using the data obtained from players. The results of our analysis can help us to introduce a kind of clustering of the space of Hearthstone cards, for the purpose of providing new decision support tools for the community of players.
- Published
- 2018
24. Toward an Intelligent HS Deck Advisor: Lessons Learned from AAIA'18 Data Mining Competition
- Author
-
Jacek Puczniewski, Tomasz Tajmajer, Andrzej Janusz, Dominik Slezak, Lukasz Grad, and Maciej Swiechowski
- Subjects
Artificial neural network ,Scope (project management) ,Computer science ,02 engineering and technology ,010501 environmental sciences ,computer.software_genre ,01 natural sciences ,Deck ,Task (project management) ,Competition (economics) ,0202 electrical engineering, electronic engineering, information engineering ,Task analysis ,020201 artificial intelligence & image processing ,Data mining ,computer ,0105 earth and related environmental sciences - Abstract
We summarize AAIA'18 Data Mining Competition organized at the Knowledge Pit platform. We explain the competition's scope and outline its results. We also review several approaches to the problem of representing Hearthstone decks in a vector space. We divide such approaches into categories based on a type of the data about individual cards that they use. Finally, we outline experiments aiming to evaluate usefulness of various deck representations for the task of win-rates prediction.
- Published
- 2018
25. How to Match Jobs and Candidates - A Recruitment Support System Based on Feature Engineering and Advanced Analytics
- Author
-
Andrzej Janusz, Sebastian Stawicki, Dominik Ślęzak, Krzysztof Stencel, Krzysztof Ciebiera, and Michał Drewniak
- Subjects
Feature engineering ,Data processing ,Focus (computing) ,business.industry ,Computer science ,02 engineering and technology ,Recommender system ,Data science ,Analytics ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Domain knowledge ,020201 artificial intelligence & image processing ,Word2vec ,Architecture ,business - Abstract
We describe a recruitment support system aiming to help recruiters in finding candidates who are likely to be interested in a given job offer. We present the architecture of that system and explain roles of its main modules. We also give examples of analytical processes supported by the system. In the paper, we focus on a data processing chain that utilizes domain knowledge for the extraction of meaningful features representing pairs of candidates and offers. Moreover, we discuss the usage of a word2vec model for finding concise vector representations of the offers, based on their short textual descriptions. Finally, we present results of an empirical evaluation of our system.
- Published
- 2018
26. On the role of feature space granulation in feature selection processes
- Author
-
Andrzej Janusz, Marcin Szczuka, Dominik Slezak, and Marek Grzegorowski
- Subjects
Theoretical computer science ,Computer science ,Feature vector ,Feature extraction ,InformationSystems_DATABASEMANAGEMENT ,Feature selection ,02 engineering and technology ,Knowledge extraction ,Feature (computer vision) ,020204 information systems ,Similarity (psychology) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Rough set ,Cluster analysis - Abstract
Information granulation plays an important role in the process of scaling up modern machine learning and knowledge discovery algorithms. By employing compact descriptions of granules — whereby granules are defined as collections of original data elements gathered together by means of their similarity, proximity or functionality — one can drastically accelerate computations and, moreover, make the results of those computations more meaningful for domain experts. In this paper, we summarize some of the feature space granulation approaches introduced by now. We discuss the meaning of similarity, proximity and functionality while considering the granules of physically existing or potentially derivable attributes. We also show several examples of utilization of the granulation structures defined over the feature spaces in the feature selection algorithms. As a case study, we consider the algorithms developed within the theory of rough sets, aimed at finding irreducible subsets of attributes that are sufficient to distinguish between the cases belonging to different target decision classes.
- Published
- 2017
27. Implementing algorithms of rough set theory and fuzzy rough set theory in the R package 'RoughSets'
- Author
-
Dominik Śle¸zak, Chris Cornelis, Francisco Herrera, Andrzej Janusz, José Manuel Benítez, Christoph Bergmeir, and Lala Septem Riza
- Subjects
Information Systems and Management ,Discretization ,Computer science ,Rule induction ,Dominance-based rough set approach ,Feature selection ,computer.software_genre ,Computer Science Applications ,Theoretical Computer Science ,k-nearest neighbors algorithm ,Data modeling ,Artificial Intelligence ,Control and Systems Engineering ,Instance selection ,Rough set ,Data mining ,computer ,Algorithm ,Software - Abstract
The package RoughSets , written mainly in the R language, provides implementations of methods from the rough set theory (RST) and fuzzy rough set theory (FRST) for data modeling and analysis. It considers not only fundamental concepts (e.g., indiscernibility relations, lower/upper approximations, etc.), but also their applications in many tasks: discretization, feature selection, instance selection, rule induction, and nearest neighbor-based classifiers. The package architecture and examples are presented in order to introduce it to researchers and practitioners. Researchers can build new models by defining custom functions as parameters, and practitioners are able to perform analysis and prediction of their data using available algorithms. Additionally, we provide a review and comparison of well-known software packages. Overall, our package should be considered as an alternative software library for analyzing data based on RST and FRST.
- Published
- 2014
28. Toward Interactive Attribute Selection with Infolattices – A Position Paper
- Author
-
Andrzej Janusz, Marek Grzegorowski, Dominik Ślęzak, and Sebastian Stawicki
- Subjects
Information retrieval ,Computer science ,business.industry ,Feature selection ,Context (language use) ,0102 computer and information sciences ,02 engineering and technology ,01 natural sciences ,Data visualization ,010201 computation theory & mathematics ,0202 electrical engineering, electronic engineering, information engineering ,Formal concept analysis ,Position paper ,020201 artificial intelligence & image processing ,Rough set ,business - Abstract
We discuss a new approach to interactive exploration of high-dimensional data sets which is aimed at building human’s understanding of the data by iterative additions of recommended attributes and objects that can together represent a context in which it may be useful to analyze the data. We identify challenges and expected benefits that our methodology can bring to the users. We also show how our ideas got inspired by Formal Concept Analysis (FCA) and Rough Set Theory (RST). It is though worth emphasizing that this particular paper is not aimed at investigating relationships between FCA and RST. Instead, the goal is to discuss which algorithmic methods developed within FCA and RST could be reused for the purpose of our approach.
- Published
- 2017
29. ISMIS 2017 Data Mining Competition: Trading Based on Recommendations
- Author
-
Dominik Ślęzak, Kamil Żbikowski, Mathurin Aché, Marzena Kryszkiewicz, Henryk Rybinski, Andrzej Janusz, and Piotr Gawrysiak
- Subjects
Competition (economics) ,Scope (project management) ,Computer science ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,02 engineering and technology ,Data mining ,computer.software_genre ,computer - Abstract
We describe ISMIS 2017 Data Mining Competition – “Trading Based on Recommendations” – which was held between November 22, 2016 and January 22, 2017 at the platform Knowledge Pit. We explain its scope and summarize its results. We also discuss the solution which achieved the best result among all participating teams.
- Published
- 2017
30. Helping AI to Play Hearthstone: AAIA'17 Data Mining Challenge
- Author
-
Maciej Swiechowski, Tomasz Tajmajer, and Andrzej Janusz
- Subjects
FOS: Computer and information sciences ,Artificial neural network ,Scope (project management) ,Computer Science - Artificial Intelligence ,Computer science ,Monte Carlo tree search ,Context (language use) ,02 engineering and technology ,computer.software_genre ,Intelligent agent ,Artificial Intelligence (cs.AI) ,Action (philosophy) ,Computer Science - Computer Science and Game Theory ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Data mining ,computer ,Computer Science and Game Theory (cs.GT) - Abstract
This paper summarizes the AAIA'17 Data Mining Challenge: Helping AI to Play Hearthstone which was held between March 23, and May 15, 2017 at the Knowledge Pit platform. We briefly describe the scope and background of this competition in the context of a more general project related to the development of an AI engine for video games, called Grail. We also discuss the outcomes of this challenge and demonstrate how predictive models for the assessment of player's winning chances can be utilized in a construction of an intelligent agent for playing Hearthstone. Finally, we show a few selected machine learning approaches for modeling state and action values in Hearthstone. We provide evaluation for a few promising solutions that may be used to create more advanced types of agents, especially in conjunction with Monte Carlo Tree Search algorithms., Comment: Federated Conference on Computer Science and Information Systems, Prague (FedCSIS-2017) (Prague, Czech Republic)
- Published
- 2017
- Full Text
- View/download PDF
31. Rough Set Methods for Attribute Clustering and Selection
- Author
-
Dominik Ślęzak and Andrzej Janusz
- Subjects
Reduct ,business.industry ,Computer science ,Computation ,media_common.quotation_subject ,Machine learning ,computer.software_genre ,Artificial Intelligence ,Quality (business) ,Rough set ,Artificial intelligence ,Data mining ,Heuristics ,Greedy algorithm ,business ,Cluster analysis ,computer ,Selection (genetic algorithm) ,media_common - Abstract
In this study we investigate methods for attribute clustering and their possible applications to the task of computation of decision reducts from information systems. We focus on high-dimensional datasets, that is, microarray data. For this type of data, the traditional reduct construction techniques either can be extremely computationally intensive or can yield poor performance in terms of the size of the resulting reducts. We propose two reduct computation heuristics that combine the greedy search with a diverse selection of candidate attributes. Our experiments confirm that by proper grouping of similar—in some sense interchangeable—attributes, it is possible to significantly decrease computation time, as well as to increase a quality of the obtained reducts (i.e., to decrease their average size). We examine several criteria for attribute clustering, and we also identify so-called garbage clusters, which contain attributes that can be regarded as irrelevant.
- Published
- 2014
32. Interactive Method for Semantic Document Indexing Based on Explicit Semantic Analysis
- Author
-
Andrzej Janusz, Adam Krasuski, Hung Son Nguyen, and Wojciech Świeboda
- Subjects
Algebra and Number Theory ,Information retrieval ,business.industry ,Computer science ,Semantic search ,computer.software_genre ,Theoretical Computer Science ,Computational Theory and Mathematics ,Semantic similarity ,Semantic equivalence ,Explicit semantic analysis ,Semantic computing ,Semantic technology ,Semantic Web Stack ,Artificial intelligence ,business ,computer ,Semantic compression ,Natural language processing ,Information Systems - Abstract
In this article we propose a general framework incorporating semantic indexing and search of texts within scientific document repositories. In our approach, a semantic interpreter, which can be seen as a tool for automatic tagging of textual data, is interactively updated based on feedback from the users, in order to improve quality of the tags that it produces. In our experiments, we index our document corpus using the Explicit Semantic Analysis (ESA) method. In this algorithm, an external knowledge base is used to measure relatedness between words and concepts, and those assessments are utilized to assign meaningful concepts to given texts. In the paper, we explain how the weights expressing relations between particular words and concepts can be improved by interaction with users or by employment of expert knowledge. We also present some results of experiments on a document corpus acquired from the PubMed Central repository to show feasibility of our approach.
- Published
- 2014
33. Combining multiple predictive models using genetic algorithms
- Author
-
Andrzej Janusz
- Subjects
Meta learning (computer science) ,Computer science ,business.industry ,Champion ,Regression analysis ,computer.software_genre ,CONTEST ,Machine learning ,Theoretical Computer Science ,Competition (economics) ,Artificial Intelligence ,Biological property ,Genetic algorithm ,Computer Vision and Pattern Recognition ,Data mining ,Artificial intelligence ,Baseline (configuration management) ,business ,computer - Abstract
Blending is a well-established technique, commonly used to increase performance of predictive models. Its effectiveness has been confirmed in practice as most of the latest international data-mining contest winners were using some kind of a committee of classifiers to produce their final entry. This paper presents a method of using a genetic algorithm to optimize an ensemble of multiple classification or regression models. An implementation of that method in R system, called Genetic Meta-Blender, was tested during the Australasian Data Mining 2009 Analytic Challenge. A subject of this data mining competition was the methods for combining predictive models. The described approach was awarded with the Grand Champion prize for achieving the best overall result. In this paper, the purpose of the challenge is described and details of the winning approach are given. The results of Genetic Meta-Blender are also discussed and compared to several baseline scores. Additionally, GMB is evaluated on data from a different data mining competition, namely SIAM SDM'11 Contest: Prediction of Biological Properties of Molecules from Chemical Structure.
- Published
- 2012
34. Rough Sets
- Author
-
Dominik Slezak and Andrzej Janusz
- Published
- 2016
35. Unsupervised Similarity Learning from Textual Data
- Author
-
Dominik Ślęzak, Hung Son Nguyen, and Andrzej Janusz
- Subjects
Text corpus ,Algebra and Number Theory ,Information retrieval ,Computer science ,Similarity heuristic ,Semantics ,Similitude ,Theoretical Computer Science ,Computational Theory and Mathematics ,Semantic similarity ,Similarity (psychology) ,Rough set ,Similarity learning ,Information Systems - Abstract
This paper presents a research on the construction of a new unsupervised model for learning a semantic similarity measure from text corpora. Two main components of the model are a semantic interpreter of texts and a similarity function whose properties are derived from data. The first one associates particular documents with concepts defined in a knowledge base corresponding to the topics covered by the corpus. It shifts the representation of a meaning of the texts from words that can be ambiguous to concepts with predefined semantics. With this new representation, the similarity function is derived from data using a modification of the dynamic rule-based similarity model, which is adjusted to the unsupervised case. The adjustment is based on a novel notion of an information bireduct having its origin in the theory of rough sets. This extension of classical information reducts is used in order to find diverse sets of reference documents described by diverse sets of reference concepts that determine different aspects of the similarity. The paper explains a general idea of the approach and also gives some implementation guidelines. Additionally, results of some preliminary experiments are presented in order to demonstrate usefulness of the proposed model.
- Published
- 2012
36. Tagging Firefighter Activities at the Emergency Scene: Summary of AAIA’15 Data Mining Competition at Knowledge Pit
- Author
-
Andrzej Janusz, Dominik Slezak, Adam Krasuski, Michal Meina, Krzysztof Rykaczewski, and Bartosz Celmer
- Subjects
Competition (economics) ,Data set ,Units of measurement ,Decision support system ,Data acquisition ,Scope (project management) ,Process (engineering) ,Computer science ,Data mining ,computer.software_genre ,Wireless sensor network ,computer - Abstract
In this paper, we summarize AAIA'15 data mining competition: Tagging Firefighter Activities at a Fire Scene, which was held between March 9 and July 6, 2015. We describe the scope and background of the competition. We also reveal details regarding the data set used in the competition, which was collected and tagged specifically for the purpose of this data challenge. We explain the data acquisition process which involved using a body sensor network system consisting of several inertial measurement units and a physiological data sensor. Finally, we briefly discuss submitted results with respect to their possible real-life application in our decision support system.
- Published
- 2015
37. Computation of Approximate Reducts with Dynamically Adjusted Approximation Threshold
- Author
-
Andrzej Janusz and Dominik Ślęzak
- Subjects
Data set ,Competition (economics) ,Theoretical computer science ,Computer science ,Computation ,Relevance (information retrieval) ,Feature selection ,Ranking (information retrieval) - Abstract
We continue our research on dynamically adjusted approximate reducts (DAAR). We modify DAAR computation algorithm to take into account dependencies between attribute values in data. We discuss a motivation for this improvement and analyze its performance impact. We also revisit a filtering technique utilizing approximate reducts to create a ranking of attributes according to their relevance. As an illustration we study a data set from AAIA’14 Data Mining Competition.
- Published
- 2015
38. Rough Set Tools for Practical Data Exploration
- Author
-
Dominik Ślęzak, Sebastian Stawicki, Andrzej Janusz, and Marcin Szczuka
- Subjects
Data exploration ,Software ,Process (engineering) ,business.industry ,Computer science ,Integrated software ,Computational statistics ,Rough set ,Software system ,Extension (predicate logic) ,Software engineering ,business - Abstract
We discuss a rough-set-based approach to the data mining process. We present a brief overview of rough-set-based data exploration and software systems for this purpose that were developed over the years. Then, we introduce the RapidRoughSets extension for the RapidMiner integrated software platform for machine learning and data mining, along with RoughSets package for R System – the leading software environment for statistical computing. We conclude with discussion of the road ahead for rough set software systems.
- Published
- 2015
39. Mining Data from Coal Mines: IJCRS’15 Data Challenge
- Author
-
Marek Grzegorowski, Marek Sikora, Dominik Ślęzak, Andrzej Janusz, Sebastian Stawicki, Łukasz Wróbel, and Piotr Wojtas
- Subjects
Competition (economics) ,Scope (project management) ,business.industry ,Computer science ,Coal mining ,Active safety ,business ,CONTEST ,Data science ,Task (project management) - Abstract
We summarize the data mining competition associated with IJCRS’15 conference – IJCRS’15 Data Challenge: Mining Data from Coal Mines, organized at Knowledge Pit web platform. The topic of this competition was related to the problem of active safety monitoring in underground corridors. In particular, the task was to design an efficient method of predicting dangerous concentrations of methane in longwalls of a Polish coal mine. We describe the scope and motivation for the competition. We also report the course of the contest and briefly discuss a few of the most interesting solutions submitted by participants. Finally, we reveal our plans for the future research within this important subject.
- Published
- 2015
40. Assessment of data granulations in context of feature extraction problem
- Author
-
Andrzej Janusz and Marcin Szczuka
- Subjects
Quality assessment ,business.industry ,Feature extraction ,Feature selection ,Pattern recognition ,computer.software_genre ,Granulation ,Decision system ,Data mining ,Artificial intelligence ,business ,Random variable ,computer ,Intuition ,Mathematics - Abstract
In this paper we investigate a method of measuring the quality of a data granulation in a decision system, defined by an indiscernibility relation in a specific type of approximation spaces. In the proposed algorithm, the concept of a random probe is used in order to estimate the probability that a given data granulation is relevant in a classification context. We explain an intuition behind our approach and show how it can be utilized for practical data analysis in tasks such as attribute selection or construction of new attributes. We also inspect relationships between the problem of finding a useful granulation of data and extracting informative features for supervised classification. To avoid low relevance of derived granules, we perform a random probe test to verify their validity. Using this technique we can more objectively assess the usefulness of a given data granulation for solving the classification problem at hand.
- Published
- 2014
41. Key Risk Factors for Polish State Fire Service: a Data Mining Competition at Knowledge Pit
- Author
-
Adam Krasuski, Mariusz Rosiak, Andrzej Janusz, Hung Son Nguyen, Dominik Slezak, and Sebastian Stawicki
- Subjects
Competition (economics) ,Service (systems architecture) ,Scope (project management) ,Order (exchange) ,Computer science ,Data analysis ,Key (cryptography) ,Context (language use) ,Data mining ,Architecture ,computer.software_genre ,computer - Abstract
In this paper we summarize AAIA'14 Data Mining Competition: Key risk factors for Polish State Fire Service which was held between February 3, 2014 and May 5, 2014 at the Knowledge Pit platform http://challenge.mimuw.edu.pl/. We describe the scope and background of this competition and we explain in details the evaluation procedure. We also briefly overview the results of this analytical challenge, showing the way in which those results can be beneficial to one of our other projects which is related to the problem of improving firefighter safety at a fire scene. Finally, we reveal some technical details regarding the architecture and functionalities of the Knowledge Pit competition platform, which we are developing in order to facilitate solving of practical problems that require advanced data analytics.
- Published
- 2014
42. Algorithms for Similarity Relation Learning from High Dimensional Data
- Author
-
Andrzej Janusz
- Subjects
business.industry ,Similarity heuristic ,Machine learning ,computer.software_genre ,Similitude ,Semantic similarity ,Similarity (network science) ,Normalized compression distance ,Case-based reasoning ,Artificial intelligence ,business ,Cluster analysis ,Algorithm ,computer ,Similarity learning ,Mathematics - Abstract
The notion of similarity plays an important role in machine learning and artificial intelligence. It is widely used in tasks related to a supervised classification, clustering, an outlier detection and planning. Moreover, in domains such as information retrieval or case-based reasoning, the concept of similarity is essential as it is used at every phase of the reasoning cycle. The similarity itself, however, is a very complex concept that slips out from formal definitions. A similarity of two objects can be different depending on a considered context. In many practical situations it is difficult even to evaluate the quality of similarity assessments without considering the task for which they were performed. Due to this fact the similarity should be learnt from data, specifically for the task at hand. This paper presents a research on the problem of similarity learning, which is a part of author’s PHD dissertation. It describes a similarity model, called Rule-Based Similarity, and shows algorithms for constructing this model from available data. The model utilizes notions from the rough set theory to derive a similarity function that allows to approximate the similarity relation in a given context. It is largely inspired by the idea of Tversky’s feature contrast model and it has several analogical properties. In the paper, those theoretical properties are described and discussed. Moreover, the paper presents results of experiments on real-life data sets, in which a quality of the proposed model is thoroughly evaluated and compared with the state-of-the-art algorithms.
- Published
- 2014
43. A Resemblance Based Approach for Recognition of Risks at a Fire Ground
- Author
-
Łukasz Sosnowski, Andrzej Pietruszka, Adam Krasuski, and Andrzej Janusz
- Subjects
Decision support system ,Computer science ,Process (engineering) ,business.industry ,Firefighting ,Process mining ,Timeline ,Machine learning ,computer.software_genre ,Similarity (psychology) ,Artificial intelligence ,Set (psychology) ,business ,computer - Abstract
This article focuses on a problem of a comparison between fire & rescue actions for a decision support at the fire ground. In our research, we split the actions into a set of frames which compose a timeline of a firefighting process. In our approach, the frames are represented as compound objects. We extract a set of features in order to represent these objects and we apply a comparator framework for the evaluation of similarities between the processes. The similarity constrains allow us to recognize the risks that appear during the actions. We justify our approach by showing results of a series of experiments which are based on reports describing real-life incidents.
- Published
- 2014
44. Random Probes in Computation and Assessment of Approximate Reducts
- Author
-
Dominik Ślęzak and Andrzej Janusz
- Subjects
Reduct ,Clustering high-dimensional data ,Computer science ,Computation ,Decision vector ,Process (computing) ,Value (computer science) ,Feature selection ,Greedy algorithm ,Algorithm - Abstract
We discuss applications of random probes in a process of computation and assessment of approximate reducts. By random probes we mean artificial attributes, generated independently from a decision vector but having similar value distributions to the attributes in the original data. We introduce a concept of a randomized reduct which is a reduct constructed solely from random probes and we show how to use it for unsupervised evaluation of attribute sets. We also propose a modification of the greedy heuristic for a computation of approximate reducts, which reduces a chance of including irrelevant attributes into a reduct. To support our claims we present results of experiments on high dimensional data. Analysis of obtained results confirms usefulness of random probes in a search for informative attribute sets.
- Published
- 2014
45. Multi-label Classification of Biomedical Articles
- Author
-
Marcin Tatjewski, Hung Son Nguyen, Andrzej Janusz, Krzysztof Pawłowski, Łukasz Romaszko, and Karol Kurach
- Subjects
Multi-label classification ,Computer science ,business.industry ,Search engine indexing ,Object (computer science) ,Machine learning ,computer.software_genre ,Ensemble learning ,Set (abstract data type) ,ComputingMethodologies_PATTERNRECOGNITION ,Binary classification ,Explicit semantic analysis ,Artificial intelligence ,Special case ,business ,computer - Abstract
In this paper we investigate a special case of classification problem, called multi-label learning, where each instance (or object) is associated with a set of target labels (or simple decisions). Multi-label classification is one of the most important issues in semantic indexing and text categorization systems. Most of multi-label classification methods are based on combination of binary classifiers, which are trained separately for each label. In this paper we concentrate on the application of ensemble technique to multi-label classification problem. We present the most recent ensemble methods for both the binary classifier training phase as well as the combination learning phase. The proposed methods have been implemented within the SONCA system which is a part of SYNAT project. We present some experiment results performed on PubMed Central biomedical articles database.
- Published
- 2013
46. Semantic Clustering of Scientific Articles Using Explicit Semantic Analysis
- Author
-
Andrzej Janusz and Marcin Szczuka
- Subjects
Text corpus ,Information retrieval ,Computer science ,business.industry ,Context (language use) ,computer.software_genre ,Semantics ,Information extraction ,Knowledge base ,Explicit semantic analysis ,Rough set ,business ,Cluster analysis ,computer - Abstract
This paper summarizes our recent research on semantic clustering of scientific articles. We present a case study which was focused on analysis of papers related to the Rough Sets theory. The proposed method groups the documents on the basis of their content, with an assistance of the DBpedia knowledge base. The text corpus is first processed using Natural Language Processing tools in order to produce vector representations of the content. In the second step the articles are matched against a collection of concepts retrieved from DBpedia. As a result, a new representation that better reflects the semantics of the texts, is constructed. With this new representation the documents are hierarchically clustered in order to form a partitioning of papers into semantically related groups. The steps in textual data preparation, the utilization of DBpedia and the employed clustering methods are explained and illustrated with experimental results. A quality of the resulting clustering is then discussed. It is assessed using feedback form human experts combined with typical cluster quality measures. These results are then discussed in the context of a larger framework that aims to facilitate search and information extraction from large textual repositories.
- Published
- 2013
47. Semantic Clustering of Scientific Articles with Use of DBpedia Knowledge Base
- Author
-
Andrzej Janusz, Marcin Szczuka, and Kamil Herba
- Subjects
Text corpus ,Information retrieval ,business.industry ,Computer science ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,Representation (arts) ,Semantics ,Partition (database) ,Text mining ,Semantic similarity ,Knowledge base ,Rough set ,business ,Cluster analysis - Abstract
A case study of semantic clustering of scientific articles related to Rough Sets is presented. The proposed method groups the documents on the basis of their content and with assistance of DBpedia knowledge base. The text corpus is first treated with Natural Language Processing tools in order to produce vector representations of the content and then matched against a collection of concepts retrieved from DBpedia. As a result, a new representation is constructed that better reflects the semantics of the texts. With this new representation, the documents are hierarchically clustered in order to form partition of papers that share semantic relatedness. The steps in textual data preparation, utilization of DBpedia and clustering are explained and illustrated with experimental results. Assessment of clustering quality by human experts and by comparison to traditional approach is presented.
- Published
- 2012
48. Interactive Document Indexing Method Based on Explicit Semantic Analysis
- Author
-
Andrzej Janusz, Adam Krasuski, Wojciech Świeboda, and Hung Son Nguyen
- Subjects
Information retrieval ,business.industry ,Computer science ,media_common.quotation_subject ,Search engine indexing ,Semantic search ,computer.software_genre ,Interactive Learning ,Index (publishing) ,Knowledge base ,Explicit semantic analysis ,Quality (business) ,Artificial intelligence ,business ,computer ,Interpreter ,Natural language processing ,media_common - Abstract
In this article we propose a general framework incorporating semantic indexing and search of texts within scientific document repositories. In our approach, a semantic interpreter, which can be seen as a tool for automatic tagging of textual data, is interactively updated based on feedback from the users, in order to improve quality of the tags that it produces. In our experiments, we index our document corpus using the Explicit Semantic Analysis (ESA) method. In this algorithm, an external knowledge base is used to measure relatedness between words and concepts, and those assessments are utilized to assign meaningful concepts to given texts. In the paper, we explain how the weights expressing relations between particular words and concepts can be improved by interaction with users or by employment of expert knowledge. We also present some results of experiments on a document corpus acquired from the PubMed Central repository to show feasibility of our approach.
- Published
- 2012
49. Dynamic Rule-Based Similarity Model for DNA Microarray Data
- Author
-
Andrzej Janusz
- Subjects
Reduct ,Relation (database) ,business.industry ,Similarity heuristic ,Pattern recognition ,computer.software_genre ,Similitude ,Semantic similarity ,Similarity (network science) ,Feature (machine learning) ,Artificial intelligence ,Rough set ,Data mining ,business ,computer ,Mathematics - Abstract
Rules-based Similarity (RBS) is a framework in which concepts from rough set theory are used for learning a similarity relation from data. This paper presents an extension of RBS called Dynamic Rules-based Similarity model (DRBS) which is designed to boost the quality of the learned relation in case of highly dimensional data. Rules-based Similarity utilizes a notion of a reduct to construct new features which can be interpreted as important aspects of a similarity in the classification context. Having defined such features it is possible to utilize the idea of Tversky's feature contrast similarity model in order to design an accurate and psychologically plausible similarity relation for a given domain of objects. DRBS tries to incorporate a broader array of aspects of the similarity into the model by constructing many heterogeneous sets of features from multiple decision reducts. To ensure diversity, the reducts are computed on random subsets of objects and attributes. This approach is particularly well-suited for dealing with "few-objects-many-attributes" problem, such as mining of DNA microarray data. The induced similarity relation and the resulting similarity function can be used to perform an accurate classification of previously unseen objects in a case-based fashion. Experiments, whose results are also presented in the paper, show that the proposed model can successfully compete with other state-of-the-art algorithms such as Random Forest or SVM.
- Published
- 2012
50. JRS’2012 Data Mining Competition: Topical Classification of Biomedical Research Papers
- Author
-
Hung Son Nguyen, Sebastian Stawicki, Adam Krasuski, Andrzej Janusz, and Dominik Ślęzak
- Subjects
Multi-label classification ,Information retrieval ,Scope (project management) ,Test data generation ,Computer science ,computer.software_genre ,CONTEST ,Data science ,Task (project management) ,Competition (economics) ,Explicit semantic analysis ,Scalability ,Data mining ,GeneralLiterature_REFERENCE(e.g.,dictionaries,encyclopedias,glossaries) ,computer - Abstract
We summarize the JRS’2012 Data Mining Competition on “Topical Classification of Biomedical Research Papers”, held between January 2, 2012 and March 30, 2012 as an interactive on-line contest hosted on the TunedIT platform ( http://tunedit.org ). We present the scope and background of the challenge task, the evaluation procedure, the progress, and the results. We also present a scalable method for the contest data generation from biomedical research papers.
- Published
- 2012
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.