Author: "Andrzej Janusz" / Language: undetermined - Searchworks@Jio Institute Digital Library Search Results

1. Learning multimodal entity representations and their ensembles, with applications in a data-driven advisory framework for video game players

Author: Andrzej Janusz, Daniel Kałuża, Maciej Matraszek, Łukasz Grad, Maciej Świechowski, and Dominik Ślęzak
Subjects: Information Systems and Management, Artificial Intelligence, Control and Systems Engineering, Software, Computer Science Applications, Theoretical Computer Science
Published: 2022

2. A Practical Study of Methods for Deriving Insightful Attribute Importance Rankings Using Decision Bireducts

Author: Andrzej Janusz, Dominik Ślęzak, Sebastian Stawicki, and Krzysztof Stencel
Published: 2023

3. BrightBox — A rough set based technology for diagnosing mistakes of machine learning models

Author: Andrzej Janusz, Andżelika Zalewska, Łukasz Wawrowski, Piotr Biczyk, Jan Ludziejewski, Marek Sikora, and Dominik Ślęzak
Subjects: Software
Published: 2023

4. IEEE BigData Cup 2022 Report Privacy-preserving Matching of Encrypted Images

Author: Marcin Szczuka, Andrzej Janusz, Boguslaw Cyganek, Jakub Grabek, Lukasz Przebinda, Andzelika Zalewska, Andrzej Bukala, and Dominik Slezak
Published: 2022

5. EVEAL - Expected Variance Estimation for Active Learning

Author: Daniel Kaluza, Andrzej Janusz, and Dominik Slezak
Published: 2022

6. KnowledgePit Meets BrightBox: A Step Toward Insightful Investigation of the Results of Data Science Competitions

Author: Andrzej Janusz and Dominik Ślęzak
Published: 2022

7. Prescriptive Analytics for Optimization of FMCG Delivery Plans

Author: Marek Grzegorowski, Andrzej Janusz, Stanisław Łażewski, Maciej Świechowski, and Monika Jankowska
Published: 2022

8. Data-Driven Resilient Supply Management Supported by Demand Forecasting

Author: Marek Grzegorowski, Andrzej Janusz, Jarosław Litwin, and Łukasz Marcinowski
Published: 2022

9. Predicting Victories in Video Games - IEEE BigData 2021 Cup Report

Author: Maciej Matraszek, Andrzej Janusz, Maciej Swiechowski, and Dominik Slezak
Published: 2021

10. Thermally stimulated processes in low density polyethylene

Author: Markiewicz., Andrzej Janusz
Subjects: ComputingMilieux_COMPUTERSANDEDUCATION, ComputerApplications_COMPUTERSINOTHERSYSTEMS, Uncategorized
Abstract: This thesis was scanned from the print manuscript for digital preservation and is copyright the author. Researchers can access this thesis by asking their local university, institution or public library to make a request on their behalf. Monash staff and postgraduate students can use the link in the References field.
Published: 2021
Full Text: View/download PDF

11. Predicting Escalations in Customer Support: Analysis of Data Mining Challenge Results

Author: Daniel Kaluza, Guohua Hao, Dominik Slezak, Andrzej Janusz, Robert Wojciechowski, and Tony Li
Subjects: Event (computing), Computer science, business.industry, media_common.quotation_subject, Frame (networking), Big data, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Data science, Data modeling, Software, 0202 electrical engineering, electronic engineering, information engineering, Task analysis, Data analysis, 020201 artificial intelligence & image processing, Quality (business), business, 0105 earth and related environmental sciences, media_common
Abstract: We summarize IEEE Big Data Cup: Predicting Escalations in Customer Support – a data mining competition organized jointly by companies Information Builders and QED Software at the KnowledgePit platform, in the frame of the 2020 IEEE International Conference on Big Data. We discuss the motivation for organizing this event and highlight the factors that make it such a challenging topic. We describe the data provided to participants and formulate the competition task. We also provide an overview of competition results with a detailed analysis of a few selected solutions. Finally, we present a novel functionality of the KnowledgePit platform – an analytic module that allows organizers to investigate selected solutions using a convenient GUI and provides in-depth insights about their quality.
Published: 2020

12. Network Device Workload Prediction: A Data Mining Challenge at Knowledge Pit

Author: Andrzej Janusz, Piotr Biczyk, Mateusz Przyborowski, and Dominik Slezak
Subjects: Computer science, Association (object-oriented programming), 020208 electrical & electronic engineering, 02 engineering and technology, Workload prediction, computer.software_genre, Networking hardware, Competition (economics), Upload, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Data mining, Baseline (configuration management), computer
Abstract: We describe the 7th edition of the international data mining competition held at Knowledge Pit in association with the FedCSIS conference series. The goal was to predict workload-related characteristics of monitored network devices. We analyze solutions uploaded by the most successful participants. We investigate prediction errors which had the greatest influence on their results. We also present our own baseline solution which turned out to be the most reliable in the final evaluation.
Published: 2020

13. On Positive-Correlation-Promoting Reducts

Author: Joanna Henzel, Andrzej Janusz, Marek Sikora, and Dominik Ślęzak
Subjects: Discrete mathematics, 0209 industrial biotechnology, Rule induction, Binary number, Feature selection, 02 engineering and technology, Positive correlation, R package, 020901 industrial engineering & automation, Knowledge extraction, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Rough set, Row, Mathematics
Abstract: We introduce a new rough-set-inspired binary feature selection framework, whereby it is preferred to choose attributes which let us distinguish between objects (cases, rows, examples) having different decision values according to the following mechanism: for objects u1 and u2 with decision values \(dec(u1)=0\) and \(dec(u2)=1\), it is preferred to select attributes a such that \(a(u1)=0\) and \(a(u2)=1\), with the secondary option – if the first one is impossible – to select a such that \(a(u1)=1\) and \(a(u2)=0\). We discuss the background for this approach, originally inspired by the needs of the genetic data analysis. We show how to derive the sets of such attributes – called positive-correlation-promoting reducts (PCP reducts in short) – using standard calculations over appropriately modified rough-set-based discernibility matrices. The proposed framework is implemented within the RoughSets R package which is widely used for the data exploration and knowledge discovery purposes.
Published: 2020

14. IEEE BigData 2019 Cup: Suspicious Network Event Recognition

Author: Agnieszka Chadzynska-Krasowska, Daniel Kaluza, Dominik Slezak, Joel Holland, Bartek Konarski, and Andrzej Janusz
Subjects: Feature engineering, Competition (economics), Recurrent neural network, Software, business.industry, Analytics, Computer science, Big data, Baseline (configuration management), business, Data science, Scope (computer science)
Abstract: “IEEE BigData 2019 Cup: Suspicious Network Event Recognition” was a data mining competition organized jointly by companies Security On-Demand and QED Software at the KnowledgePit online platform, in association with the IEEE BigData 2019 conference. The scope of this challenge referred to the notions of cybersecurity analytics and network alert evaluation. In this paper, we summarize the results of our competition. We explain how data sets had been prepared before it was possible to make them available to competition participants. We describe the baseline scoring models that we designed as a reference for participants, and we demonstrate how critical for their performance was the aspect of appropriate feature engineering. We also discuss the results of experiments conducted to verify the (un)suitability of deep recurrent neural networks in this particular case. In some sense, we show that there are no “perfect” machine learning approaches that could be applied equally successfully to every data science undertaking.
Published: 2019

15. A framework for learning and embedding multi-sensor forecasting models into a decision support system: A case study of methane concentration in coal mines

Author: Marek Sikora, Dominik Ślęzak, Andrzej Janusz, Łukasz Wróbel, Marek Grzegorowski, Michał Kozielski, Sebastian Stawicki, and Sinh Hoa Nguyen
Subjects: Decision support system, Information Systems and Management, Computer science, business.industry, Feature extraction, Coal mining, 02 engineering and technology, computer.software_genre, Computer Science Applications, Theoretical Computer Science, Task (project management), Data set, Artificial Intelligence, Control and Systems Engineering, Feature (computer vision), Analytics, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Embedding, 020201 artificial intelligence & image processing, Data mining, business, computer, Software
Abstract: We introduce a new approach for learning forecasting models over large multi-sensor data sets, including the steps of sliding-window-based feature extraction and rough-set-inspired feature subset ensemble selection. We show how to integrate this approach with the major data-processing-related components of DISESOR – a decision support system which is a coherent and complete framework for exploring streams of sensor readings registered in underground coal mines. As a case study, we report our experiments related to the task of methane concentration forecasting. The contributions in this paper refer to both the analysis how the nature of sensor readings influenced the architecture of the developed system and the empirical proof that the designed methods for data processing and analytics turned out to be efficient in practice.
Published: 2018

16. Cost Optimization for Big Data Workloads Based on Dynamic Scheduling and Cluster-Size Tuning

Author: Andrzej Janusz, Cas Apanowicz, Petre Lameski, Marek Grzegorowski, Eftim Zdravevski, and Dominik Ślęzak
Subjects: Schedule, Data processing, Information Systems and Management, business.industry, Computer science, Distributed computing, Big data, Spot market, Cloud computing, 02 engineering and technology, Dynamic priority scheduling, Total cost of ownership, Computer Science Applications, Management Information Systems, 020204 information systems, Scalability, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, business, Information Systems
Abstract: Analytical data processing has become the cornerstone of today's businesses success, and it is facilitated by Big Data platforms that offer virtually limitless scalability. However, minimizing the total cost of ownership (TCO) for the infrastructure can be challenging. We propose a novel method to build resilient clusters on cloud resources that are fine-tuned to the particular data processing task. The presented architecture follows the infrastructure-as-a-code paradigm so that the cluster can be dynamically configured and managed. It first identifies the optimal cluster size to perform a job in the required time. Then, by analyzing spot instance price history and using ARIMA models, it optimizes the schedule of the job execution to leverage the discounted prices of the cloud spot market. In particular, we evaluated savings opportunities when using Amazon EC2 spot instances comparing to on-demand resources. The performed experiments confirmed that the prediction module significantly improved the cost-effectiveness of the solution – up to 80% savings compared to the on-demand prices, and at the worst-case, 1% more cost than the absolute minimum. The production deployments of the architecture show that it is invaluable for minimizing the total cost of ownership of analytical data processing solutions.
Published: 2021

17. Decision bireducts and decision reducts – a comparison

Author: Dominik Ślęzak, Sebastian Widz, Andrzej Janusz, and Sebastian Stawicki
Subjects: Computational complexity theory, business.industry, Applied Mathematics, 010102 general mathematics, Decision tree, Evidential reasoning approach, 02 engineering and technology, Decision rule, Machine learning, computer.software_genre, 01 natural sciences, Theoretical Computer Science, Artificial Intelligence, Simple (abstract algebra), 0202 electrical engineering, electronic engineering, information engineering, Influence diagram, 020201 artificial intelligence & image processing, Artificial intelligence, 0101 mathematics, business, Completeness (statistics), computer, Software, Optimal decision, Mathematics
Abstract: In this paper we revise the notion of decision bireducts. We show new interpretations and we prove several important and practically useful facts regarding this notion. We also explain the way in which some of the well-known algorithms for computation of decision reducts can be modified for the purpose of computing decision bireducts. For the sake of completeness of our study we extend our investigations to relations between decision bireducts and so-called approximate decision reducts. We compare different formulations of those two approaches and draw analogies between them. We also report new results related to NP-hardness of searching for optimal decision bireducts and approximate decision reducts from data. Finally, we present new results of empirical tests which demonstrate usefulness of decision bireducts in a construction of efficient, yet simple ensembles of classification models.
Published: 2017

18. Clash Royale Challenge: How to Select Training Decks for Win-rate Prediction

Author: Andrzej Janusz, Marek Grzegorowski, and Lukasz Grad
Subjects: Scope (project management), Computer science, Association (object-oriented programming), 02 engineering and technology, 010501 environmental sciences, Collision, 01 natural sciences, Data science, Training (civil), Task (project management), Competition (economics), 020204 information systems, Active learning, 0202 electrical engineering, electronic engineering, information engineering, Information system, 0105 earth and related environmental sciences
Abstract: We summarize the sixth data mining competition organized at the Knowledge Pit platform in association with the Federated Conference on Computer Science and Information Systems series, titled Clash Royale Challenge: How to Select Training Decks for Win-rate Prediction. We outline the scope of this challenge and briefly present its results. We also discuss the problem of acquiring knowledge about new notions from video games through an active learning cycle. We explain how this task is related to the problem considered in the challenge and share results of experiments that we conducted to demonstrate usefulness of the active learning approach in practice.
Published: 2019

19. Analytics over Multi-sensor Time Series Data – A Case-Study on Prediction of Mining Hazards

Author: Andrzej Janusz and Dominik Ślęzak
Subjects: Decision support system, business.industry, Computer science, Dimensionality reduction, computer.software_genre, Interchangeability, Task (project management), Subject-matter expert, Analytics, Data mining, Time series, business, computer, Predictive modelling
Abstract: Mining of high-dimensional time series data that represent readings of multiple sensors is a challenging task. We focus on several important aspects of analytics over such data. We describe a methodology for extracting informative features from multidimensional data streams, as well as algorithms for finding compact representations of such data, in order to facilitate the construction of prediction models. We pay special attention to designing new approaches to dimensionality reduction and interchangeability of features that such representations comprise of. We validate our algorithms on data sets obtained from coal mines and we demonstrate how their results can be applied for a construction of a decision support system. We show that such system is efficient and that its outcomes can be easily interpreted by subject matter experts.
Published: 2019

20. SENSEI: An Intelligent Advisory System for the eSport Community and Casual Players

Author: Andrzej Janusz, Krzysztof Stencel, Sebastian Stawicki, and Dominik Slezak
Subjects: Advisory system, Casual, Analytics, business.industry, Computer science, Human–computer interaction, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, 02 engineering and technology, business, Conceptual architecture, Advice (programming)
Abstract: In this article, we describe the SENSEI system. It helps players to improve their skills in popular eSports games. We discuss the main goals of the system and explain the associated challenges. We also present its conceptual architecture which aims at enabling full automation of the data acquisition and analytic processes. The system is expected to provide in-depth analytics of players' performance and give practical advice regarding possible improvements. Thus its architecture allows players to provide feedback and manually label important concepts. Finally, we discuss our first case study - an advisory system for popular collectible card video games.
Published: 2018

21. Toward Machine Learning on Granulated Data – a Case of Compact Autoencoder-based Representations of Satellite Images

Author: Tomasz Tajmajer, Dominik Slezak, Andrzej Janusz, Piotr Biczyk, Mateusz Przyborowski, and Lukasz Grad
Subjects: 030110 physiology, 0301 basic medicine, Computer science, business.industry, Deep learning, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Cognitive neuroscience of visual object recognition, Pattern recognition, 030229 sport sciences, Autoencoder, Image (mathematics), 03 medical and health sciences, 0302 clinical medicine, Satellite, Artificial intelligence, business, Image retrieval
Abstract: We consider a problem of learning from compact representations of images for a purpose of object recognition and content-based image retrieval. We discuss a motivation for using compressed images in those tasks and indicate exemplary applications related to analysis on the data from satellites. Finally, we show some preliminary results of experiments conducted to demonstrate the impact of the image data granulation on the quality of classification. We empirically compare the performance of prediction models trained on original images, images compressed using autoencoders, and on images whose quality was lowered in order to reduce their size.
Published: 2018

22. Utilizing Hybrid Information Sources to Learn Representations of Cards in Collectible Card Video Games

Author: Andrzej Janusz, Dominik Slezak, and Lukasz Grad
Subjects: Information retrieval, Computer science, business.industry, Latent semantic analysis, 02 engineering and technology, Interchangeability, Task (project management), 020204 information systems, Similarity (psychology), 0202 electrical engineering, electronic engineering, information engineering, Task analysis, 020201 artificial intelligence & image processing, Word2vec, The Internet, business
Abstract: We investigate the problem of learning representations of cards in collectible card video games. Our goal is to utilize such representations in modeling contextual similarity between cards. When constructed appropriately, such similarity models can offer many benefits to players. In particular, one can employ them to recommend cheaper or more available card replacements in popular decks. To this end, we utilize some known NLP methods, such as word2vec and Latent Semantic Analysis, to extract card embeddings from their base characteristics and textual descriptions. We also propose two new approaches that make use of information regarding multiple decks constructed by the community of players and attempt to capture the notion of card interchangeability. We empirically validate the described methods and compare their performance using data obtained for two popular games, Hearthstone: Heroes of Warcraft and Clash Royale. In the experiments, we consider various representations of cards and then, we derive the corresponding similarities. To validate the compared methods, we check how consistent are the similarity measurements, which they produce with the assessments made by experienced players. Results of our analysis show that combining outcomes of methods that work with different sources of information, i.e., textual descriptions of individual cards and deck-specific card co-occurrences, can improve performance in the task of similarity assessment. Moreover, a clustering of cards in the constructed vector space can provide some interesting insights for the community of players. As already mentioned, it can be used to suggest replacements of cards that players lack in their collections or to indicate cards that are likely to deteriorate win chances of particular decks.
Published: 2018

23. Investigating Similarity between Hearthstone Cards: Text Embeddings and Interchangeability Approaches

Author: Dominik Slezak and Andrzej Janusz
Subjects: Information retrieval, Computer science, 020204 information systems, Similarity (psychology), ComputingMilieux_PERSONALCOMPUTING, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Word2vec, 02 engineering and technology, Space (commercial competition), Video game, Interchangeability
Abstract: We investigate similarities between cards and decks from the video game Hearthstone: Heroes of Warcraft. We utilize some NLP methods, such as word2vec and LSA, to learn card representations from their descriptions. We also attempt to quantify interchangeability of cards in Hearthstone decks. We experimentally validate the presented methods and compare their performance using the data obtained from players. The results of our analysis can help us to introduce a kind of clustering of the space of Hearthstone cards, for the purpose of providing new decision support tools for the community of players.
Published: 2018

24. Toward an Intelligent HS Deck Advisor: Lessons Learned from AAIA'18 Data Mining Competition

Author: Jacek Puczniewski, Tomasz Tajmajer, Andrzej Janusz, Dominik Slezak, Lukasz Grad, and Maciej Swiechowski
Subjects: Artificial neural network, Scope (project management), Computer science, 02 engineering and technology, 010501 environmental sciences, computer.software_genre, 01 natural sciences, Deck, Task (project management), Competition (economics), 0202 electrical engineering, electronic engineering, information engineering, Task analysis, 020201 artificial intelligence & image processing, Data mining, computer, 0105 earth and related environmental sciences
Abstract: We summarize AAIA'18 Data Mining Competition organized at the Knowledge Pit platform. We explain the competition's scope and outline its results. We also review several approaches to the problem of representing Hearthstone decks in a vector space. We divide such approaches into categories based on a type of the data about individual cards that they use. Finally, we outline experiments aiming to evaluate usefulness of various deck representations for the task of win-rates prediction.
Published: 2018

25. How to Match Jobs and Candidates - A Recruitment Support System Based on Feature Engineering and Advanced Analytics

Author: Andrzej Janusz, Sebastian Stawicki, Dominik Ślęzak, Krzysztof Stencel, Krzysztof Ciebiera, and Michał Drewniak
Subjects: Feature engineering, Data processing, Focus (computing), business.industry, Computer science, 02 engineering and technology, Recommender system, Data science, Analytics, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Domain knowledge, 020201 artificial intelligence & image processing, Word2vec, Architecture, business
Abstract: We describe a recruitment support system aiming to help recruiters in finding candidates who are likely to be interested in a given job offer. We present the architecture of that system and explain roles of its main modules. We also give examples of analytical processes supported by the system. In the paper, we focus on a data processing chain that utilizes domain knowledge for the extraction of meaningful features representing pairs of candidates and offers. Moreover, we discuss the usage of a word2vec model for finding concise vector representations of the offers, based on their short textual descriptions. Finally, we present results of an empirical evaluation of our system.
Published: 2018

26. On the role of feature space granulation in feature selection processes

Author: Andrzej Janusz, Marcin Szczuka, Dominik Slezak, and Marek Grzegorowski
Subjects: Theoretical computer science, Computer science, Feature vector, Feature extraction, InformationSystems_DATABASEMANAGEMENT, Feature selection, 02 engineering and technology, Knowledge extraction, Feature (computer vision), 020204 information systems, Similarity (psychology), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Rough set, Cluster analysis
Abstract: Information granulation plays an important role in the process of scaling up modern machine learning and knowledge discovery algorithms. By employing compact descriptions of granules — whereby granules are defined as collections of original data elements gathered together by means of their similarity, proximity or functionality — one can drastically accelerate computations and, moreover, make the results of those computations more meaningful for domain experts. In this paper, we summarize some of the feature space granulation approaches introduced by now. We discuss the meaning of similarity, proximity and functionality while considering the granules of physically existing or potentially derivable attributes. We also show several examples of utilization of the granulation structures defined over the feature spaces in the feature selection algorithms. As a case study, we consider the algorithms developed within the theory of rough sets, aimed at finding irreducible subsets of attributes that are sufficient to distinguish between the cases belonging to different target decision classes.
Published: 2017

27. Implementing algorithms of rough set theory and fuzzy rough set theory in the R package 'RoughSets'

Author: Dominik Śle¸zak, Chris Cornelis, Francisco Herrera, Andrzej Janusz, José Manuel Benítez, Christoph Bergmeir, and Lala Septem Riza
Subjects: Information Systems and Management, Discretization, Computer science, Rule induction, Dominance-based rough set approach, Feature selection, computer.software_genre, Computer Science Applications, Theoretical Computer Science, k-nearest neighbors algorithm, Data modeling, Artificial Intelligence, Control and Systems Engineering, Instance selection, Rough set, Data mining, computer, Algorithm, Software
Abstract: The package RoughSets , written mainly in the R language, provides implementations of methods from the rough set theory (RST) and fuzzy rough set theory (FRST) for data modeling and analysis. It considers not only fundamental concepts (e.g., indiscernibility relations, lower/upper approximations, etc.), but also their applications in many tasks: discretization, feature selection, instance selection, rule induction, and nearest neighbor-based classifiers. The package architecture and examples are presented in order to introduce it to researchers and practitioners. Researchers can build new models by defining custom functions as parameters, and practitioners are able to perform analysis and prediction of their data using available algorithms. Additionally, we provide a review and comparison of well-known software packages. Overall, our package should be considered as an alternative software library for analyzing data based on RST and FRST.
Published: 2014

28. Toward Interactive Attribute Selection with Infolattices – A Position Paper

Author: Andrzej Janusz, Marek Grzegorowski, Dominik Ślęzak, and Sebastian Stawicki
Subjects: Information retrieval, Computer science, business.industry, Feature selection, Context (language use), 0102 computer and information sciences, 02 engineering and technology, 01 natural sciences, Data visualization, 010201 computation theory & mathematics, 0202 electrical engineering, electronic engineering, information engineering, Formal concept analysis, Position paper, 020201 artificial intelligence & image processing, Rough set, business
Abstract: We discuss a new approach to interactive exploration of high-dimensional data sets which is aimed at building human’s understanding of the data by iterative additions of recommended attributes and objects that can together represent a context in which it may be useful to analyze the data. We identify challenges and expected benefits that our methodology can bring to the users. We also show how our ideas got inspired by Formal Concept Analysis (FCA) and Rough Set Theory (RST). It is though worth emphasizing that this particular paper is not aimed at investigating relationships between FCA and RST. Instead, the goal is to discuss which algorithmic methods developed within FCA and RST could be reused for the purpose of our approach.
Published: 2017

29. ISMIS 2017 Data Mining Competition: Trading Based on Recommendations

Author: Dominik Ślęzak, Kamil Żbikowski, Mathurin Aché, Marzena Kryszkiewicz, Henryk Rybinski, Andrzej Janusz, and Piotr Gawrysiak
Subjects: Competition (economics), Scope (project management), Computer science, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, 02 engineering and technology, Data mining, computer.software_genre, computer
Abstract: We describe ISMIS 2017 Data Mining Competition – “Trading Based on Recommendations” – which was held between November 22, 2016 and January 22, 2017 at the platform Knowledge Pit. We explain its scope and summarize its results. We also discuss the solution which achieved the best result among all participating teams.
Published: 2017

30. Helping AI to Play Hearthstone: AAIA'17 Data Mining Challenge

Author: Maciej Swiechowski, Tomasz Tajmajer, and Andrzej Janusz
Subjects: FOS: Computer and information sciences, Artificial neural network, Scope (project management), Computer Science - Artificial Intelligence, Computer science, Monte Carlo tree search, Context (language use), 02 engineering and technology, computer.software_genre, Intelligent agent, Artificial Intelligence (cs.AI), Action (philosophy), Computer Science - Computer Science and Game Theory, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Data mining, computer, Computer Science and Game Theory (cs.GT)
Abstract: This paper summarizes the AAIA'17 Data Mining Challenge: Helping AI to Play Hearthstone which was held between March 23, and May 15, 2017 at the Knowledge Pit platform. We briefly describe the scope and background of this competition in the context of a more general project related to the development of an AI engine for video games, called Grail. We also discuss the outcomes of this challenge and demonstrate how predictive models for the assessment of player's winning chances can be utilized in a construction of an intelligent agent for playing Hearthstone. Finally, we show a few selected machine learning approaches for modeling state and action values in Hearthstone. We provide evaluation for a few promising solutions that may be used to create more advanced types of agents, especially in conjunction with Monte Carlo Tree Search algorithms., Comment: Federated Conference on Computer Science and Information Systems, Prague (FedCSIS-2017) (Prague, Czech Republic)
Published: 2017
Full Text: View/download PDF

31. Rough Set Methods for Attribute Clustering and Selection

Author: Dominik Ślęzak and Andrzej Janusz
Subjects: Reduct, business.industry, Computer science, Computation, media_common.quotation_subject, Machine learning, computer.software_genre, Artificial Intelligence, Quality (business), Rough set, Artificial intelligence, Data mining, Heuristics, Greedy algorithm, business, Cluster analysis, computer, Selection (genetic algorithm), media_common
Abstract: In this study we investigate methods for attribute clustering and their possible applications to the task of computation of decision reducts from information systems. We focus on high-dimensional datasets, that is, microarray data. For this type of data, the traditional reduct construction techniques either can be extremely computationally intensive or can yield poor performance in terms of the size of the resulting reducts. We propose two reduct computation heuristics that combine the greedy search with a diverse selection of candidate attributes. Our experiments confirm that by proper grouping of similar—in some sense interchangeable—attributes, it is possible to significantly decrease computation time, as well as to increase a quality of the obtained reducts (i.e., to decrease their average size). We examine several criteria for attribute clustering, and we also identify so-called garbage clusters, which contain attributes that can be regarded as irrelevant.
Published: 2014

32. Interactive Method for Semantic Document Indexing Based on Explicit Semantic Analysis

Author: Andrzej Janusz, Adam Krasuski, Hung Son Nguyen, and Wojciech Świeboda
Subjects: Algebra and Number Theory, Information retrieval, business.industry, Computer science, Semantic search, computer.software_genre, Theoretical Computer Science, Computational Theory and Mathematics, Semantic similarity, Semantic equivalence, Explicit semantic analysis, Semantic computing, Semantic technology, Semantic Web Stack, Artificial intelligence, business, computer, Semantic compression, Natural language processing, Information Systems
Abstract: In this article we propose a general framework incorporating semantic indexing and search of texts within scientific document repositories. In our approach, a semantic interpreter, which can be seen as a tool for automatic tagging of textual data, is interactively updated based on feedback from the users, in order to improve quality of the tags that it produces. In our experiments, we index our document corpus using the Explicit Semantic Analysis (ESA) method. In this algorithm, an external knowledge base is used to measure relatedness between words and concepts, and those assessments are utilized to assign meaningful concepts to given texts. In the paper, we explain how the weights expressing relations between particular words and concepts can be improved by interaction with users or by employment of expert knowledge. We also present some results of experiments on a document corpus acquired from the PubMed Central repository to show feasibility of our approach.
Published: 2014

33. Combining multiple predictive models using genetic algorithms

Author: Andrzej Janusz
Subjects: Meta learning (computer science), Computer science, business.industry, Champion, Regression analysis, computer.software_genre, CONTEST, Machine learning, Theoretical Computer Science, Competition (economics), Artificial Intelligence, Biological property, Genetic algorithm, Computer Vision and Pattern Recognition, Data mining, Artificial intelligence, Baseline (configuration management), business, computer
Abstract: Blending is a well-established technique, commonly used to increase performance of predictive models. Its effectiveness has been confirmed in practice as most of the latest international data-mining contest winners were using some kind of a committee of classifiers to produce their final entry. This paper presents a method of using a genetic algorithm to optimize an ensemble of multiple classification or regression models. An implementation of that method in R system, called Genetic Meta-Blender, was tested during the Australasian Data Mining 2009 Analytic Challenge. A subject of this data mining competition was the methods for combining predictive models. The described approach was awarded with the Grand Champion prize for achieving the best overall result. In this paper, the purpose of the challenge is described and details of the winning approach are given. The results of Genetic Meta-Blender are also discussed and compared to several baseline scores. Additionally, GMB is evaluated on data from a different data mining competition, namely SIAM SDM'11 Contest: Prediction of Biological Properties of Molecules from Chemical Structure.
Published: 2012

34. Rough Sets

Author: Dominik Slezak and Andrzej Janusz
Published: 2016

35. Unsupervised Similarity Learning from Textual Data

Author: Dominik Ślęzak, Hung Son Nguyen, and Andrzej Janusz
Subjects: Text corpus, Algebra and Number Theory, Information retrieval, Computer science, Similarity heuristic, Semantics, Similitude, Theoretical Computer Science, Computational Theory and Mathematics, Semantic similarity, Similarity (psychology), Rough set, Similarity learning, Information Systems
Abstract: This paper presents a research on the construction of a new unsupervised model for learning a semantic similarity measure from text corpora. Two main components of the model are a semantic interpreter of texts and a similarity function whose properties are derived from data. The first one associates particular documents with concepts defined in a knowledge base corresponding to the topics covered by the corpus. It shifts the representation of a meaning of the texts from words that can be ambiguous to concepts with predefined semantics. With this new representation, the similarity function is derived from data using a modification of the dynamic rule-based similarity model, which is adjusted to the unsupervised case. The adjustment is based on a novel notion of an information bireduct having its origin in the theory of rough sets. This extension of classical information reducts is used in order to find diverse sets of reference documents described by diverse sets of reference concepts that determine different aspects of the similarity. The paper explains a general idea of the approach and also gives some implementation guidelines. Additionally, results of some preliminary experiments are presented in order to demonstrate usefulness of the proposed model.
Published: 2012

36. Tagging Firefighter Activities at the Emergency Scene: Summary of AAIA’15 Data Mining Competition at Knowledge Pit

Author: Andrzej Janusz, Dominik Slezak, Adam Krasuski, Michal Meina, Krzysztof Rykaczewski, and Bartosz Celmer
Subjects: Competition (economics), Data set, Units of measurement, Decision support system, Data acquisition, Scope (project management), Process (engineering), Computer science, Data mining, computer.software_genre, Wireless sensor network, computer
Abstract: In this paper, we summarize AAIA'15 data mining competition: Tagging Firefighter Activities at a Fire Scene, which was held between March 9 and July 6, 2015. We describe the scope and background of the competition. We also reveal details regarding the data set used in the competition, which was collected and tagged specifically for the purpose of this data challenge. We explain the data acquisition process which involved using a body sensor network system consisting of several inertial measurement units and a physiological data sensor. Finally, we briefly discuss submitted results with respect to their possible real-life application in our decision support system.
Published: 2015

37. Computation of Approximate Reducts with Dynamically Adjusted Approximation Threshold

Author: Andrzej Janusz and Dominik Ślęzak
Subjects: Data set, Competition (economics), Theoretical computer science, Computer science, Computation, Relevance (information retrieval), Feature selection, Ranking (information retrieval)
Abstract: We continue our research on dynamically adjusted approximate reducts (DAAR). We modify DAAR computation algorithm to take into account dependencies between attribute values in data. We discuss a motivation for this improvement and analyze its performance impact. We also revisit a filtering technique utilizing approximate reducts to create a ranking of attributes according to their relevance. As an illustration we study a data set from AAIA’14 Data Mining Competition.
Published: 2015

38. Rough Set Tools for Practical Data Exploration

Author: Dominik Ślęzak, Sebastian Stawicki, Andrzej Janusz, and Marcin Szczuka
Subjects: Data exploration, Software, Process (engineering), business.industry, Computer science, Integrated software, Computational statistics, Rough set, Software system, Extension (predicate logic), Software engineering, business
Abstract: We discuss a rough-set-based approach to the data mining process. We present a brief overview of rough-set-based data exploration and software systems for this purpose that were developed over the years. Then, we introduce the RapidRoughSets extension for the RapidMiner integrated software platform for machine learning and data mining, along with RoughSets package for R System – the leading software environment for statistical computing. We conclude with discussion of the road ahead for rough set software systems.
Published: 2015

39. Mining Data from Coal Mines: IJCRS’15 Data Challenge

Author: Marek Grzegorowski, Marek Sikora, Dominik Ślęzak, Andrzej Janusz, Sebastian Stawicki, Łukasz Wróbel, and Piotr Wojtas
Subjects: Competition (economics), Scope (project management), business.industry, Computer science, Coal mining, Active safety, business, CONTEST, Data science, Task (project management)
Abstract: We summarize the data mining competition associated with IJCRS’15 conference – IJCRS’15 Data Challenge: Mining Data from Coal Mines, organized at Knowledge Pit web platform. The topic of this competition was related to the problem of active safety monitoring in underground corridors. In particular, the task was to design an efficient method of predicting dangerous concentrations of methane in longwalls of a Polish coal mine. We describe the scope and motivation for the competition. We also report the course of the contest and briefly discuss a few of the most interesting solutions submitted by participants. Finally, we reveal our plans for the future research within this important subject.
Published: 2015

40. Assessment of data granulations in context of feature extraction problem

Author: Andrzej Janusz and Marcin Szczuka
Subjects: Quality assessment, business.industry, Feature extraction, Feature selection, Pattern recognition, computer.software_genre, Granulation, Decision system, Data mining, Artificial intelligence, business, Random variable, computer, Intuition, Mathematics
Abstract: In this paper we investigate a method of measuring the quality of a data granulation in a decision system, defined by an indiscernibility relation in a specific type of approximation spaces. In the proposed algorithm, the concept of a random probe is used in order to estimate the probability that a given data granulation is relevant in a classification context. We explain an intuition behind our approach and show how it can be utilized for practical data analysis in tasks such as attribute selection or construction of new attributes. We also inspect relationships between the problem of finding a useful granulation of data and extracting informative features for supervised classification. To avoid low relevance of derived granules, we perform a random probe test to verify their validity. Using this technique we can more objectively assess the usefulness of a given data granulation for solving the classification problem at hand.
Published: 2014

41. Key Risk Factors for Polish State Fire Service: a Data Mining Competition at Knowledge Pit

Author: Adam Krasuski, Mariusz Rosiak, Andrzej Janusz, Hung Son Nguyen, Dominik Slezak, and Sebastian Stawicki
Subjects: Competition (economics), Service (systems architecture), Scope (project management), Order (exchange), Computer science, Data analysis, Key (cryptography), Context (language use), Data mining, Architecture, computer.software_genre, computer
Abstract: In this paper we summarize AAIA'14 Data Mining Competition: Key risk factors for Polish State Fire Service which was held between February 3, 2014 and May 5, 2014 at the Knowledge Pit platform http://challenge.mimuw.edu.pl/. We describe the scope and background of this competition and we explain in details the evaluation procedure. We also briefly overview the results of this analytical challenge, showing the way in which those results can be beneficial to one of our other projects which is related to the problem of improving firefighter safety at a fire scene. Finally, we reveal some technical details regarding the architecture and functionalities of the Knowledge Pit competition platform, which we are developing in order to facilitate solving of practical problems that require advanced data analytics.
Published: 2014

42. Algorithms for Similarity Relation Learning from High Dimensional Data

Author: Andrzej Janusz
Subjects: business.industry, Similarity heuristic, Machine learning, computer.software_genre, Similitude, Semantic similarity, Similarity (network science), Normalized compression distance, Case-based reasoning, Artificial intelligence, business, Cluster analysis, Algorithm, computer, Similarity learning, Mathematics
Abstract: The notion of similarity plays an important role in machine learning and artificial intelligence. It is widely used in tasks related to a supervised classification, clustering, an outlier detection and planning. Moreover, in domains such as information retrieval or case-based reasoning, the concept of similarity is essential as it is used at every phase of the reasoning cycle. The similarity itself, however, is a very complex concept that slips out from formal definitions. A similarity of two objects can be different depending on a considered context. In many practical situations it is difficult even to evaluate the quality of similarity assessments without considering the task for which they were performed. Due to this fact the similarity should be learnt from data, specifically for the task at hand. This paper presents a research on the problem of similarity learning, which is a part of author’s PHD dissertation. It describes a similarity model, called Rule-Based Similarity, and shows algorithms for constructing this model from available data. The model utilizes notions from the rough set theory to derive a similarity function that allows to approximate the similarity relation in a given context. It is largely inspired by the idea of Tversky’s feature contrast model and it has several analogical properties. In the paper, those theoretical properties are described and discussed. Moreover, the paper presents results of experiments on real-life data sets, in which a quality of the proposed model is thoroughly evaluated and compared with the state-of-the-art algorithms.
Published: 2014

43. A Resemblance Based Approach for Recognition of Risks at a Fire Ground

Author: Łukasz Sosnowski, Andrzej Pietruszka, Adam Krasuski, and Andrzej Janusz
Subjects: Decision support system, Computer science, Process (engineering), business.industry, Firefighting, Process mining, Timeline, Machine learning, computer.software_genre, Similarity (psychology), Artificial intelligence, Set (psychology), business, computer
Abstract: This article focuses on a problem of a comparison between fire & rescue actions for a decision support at the fire ground. In our research, we split the actions into a set of frames which compose a timeline of a firefighting process. In our approach, the frames are represented as compound objects. We extract a set of features in order to represent these objects and we apply a comparator framework for the evaluation of similarities between the processes. The similarity constrains allow us to recognize the risks that appear during the actions. We justify our approach by showing results of a series of experiments which are based on reports describing real-life incidents.
Published: 2014

44. Random Probes in Computation and Assessment of Approximate Reducts

Author: Dominik Ślęzak and Andrzej Janusz
Subjects: Reduct, Clustering high-dimensional data, Computer science, Computation, Decision vector, Process (computing), Value (computer science), Feature selection, Greedy algorithm, Algorithm
Abstract: We discuss applications of random probes in a process of computation and assessment of approximate reducts. By random probes we mean artificial attributes, generated independently from a decision vector but having similar value distributions to the attributes in the original data. We introduce a concept of a randomized reduct which is a reduct constructed solely from random probes and we show how to use it for unsupervised evaluation of attribute sets. We also propose a modification of the greedy heuristic for a computation of approximate reducts, which reduces a chance of including irrelevant attributes into a reduct. To support our claims we present results of experiments on high dimensional data. Analysis of obtained results confirms usefulness of random probes in a search for informative attribute sets.
Published: 2014

45. Multi-label Classification of Biomedical Articles

Author: Marcin Tatjewski, Hung Son Nguyen, Andrzej Janusz, Krzysztof Pawłowski, Łukasz Romaszko, and Karol Kurach
Subjects: Multi-label classification, Computer science, business.industry, Search engine indexing, Object (computer science), Machine learning, computer.software_genre, Ensemble learning, Set (abstract data type), ComputingMethodologies_PATTERNRECOGNITION, Binary classification, Explicit semantic analysis, Artificial intelligence, Special case, business, computer
Abstract: In this paper we investigate a special case of classification problem, called multi-label learning, where each instance (or object) is associated with a set of target labels (or simple decisions). Multi-label classification is one of the most important issues in semantic indexing and text categorization systems. Most of multi-label classification methods are based on combination of binary classifiers, which are trained separately for each label. In this paper we concentrate on the application of ensemble technique to multi-label classification problem. We present the most recent ensemble methods for both the binary classifier training phase as well as the combination learning phase. The proposed methods have been implemented within the SONCA system which is a part of SYNAT project. We present some experiment results performed on PubMed Central biomedical articles database.
Published: 2013

46. Semantic Clustering of Scientific Articles Using Explicit Semantic Analysis

Author: Andrzej Janusz and Marcin Szczuka
Subjects: Text corpus, Information retrieval, Computer science, business.industry, Context (language use), computer.software_genre, Semantics, Information extraction, Knowledge base, Explicit semantic analysis, Rough set, business, Cluster analysis, computer
Abstract: This paper summarizes our recent research on semantic clustering of scientific articles. We present a case study which was focused on analysis of papers related to the Rough Sets theory. The proposed method groups the documents on the basis of their content, with an assistance of the DBpedia knowledge base. The text corpus is first processed using Natural Language Processing tools in order to produce vector representations of the content. In the second step the articles are matched against a collection of concepts retrieved from DBpedia. As a result, a new representation that better reflects the semantics of the texts, is constructed. With this new representation the documents are hierarchically clustered in order to form a partitioning of papers into semantically related groups. The steps in textual data preparation, the utilization of DBpedia and the employed clustering methods are explained and illustrated with experimental results. A quality of the resulting clustering is then discussed. It is assessed using feedback form human experts combined with typical cluster quality measures. These results are then discussed in the context of a larger framework that aims to facilitate search and information extraction from large textual repositories.
Published: 2013

47. Semantic Clustering of Scientific Articles with Use of DBpedia Knowledge Base

Author: Andrzej Janusz, Marcin Szczuka, and Kamil Herba
Subjects: Text corpus, Information retrieval, business.industry, Computer science, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, Representation (arts), Semantics, Partition (database), Text mining, Semantic similarity, Knowledge base, Rough set, business, Cluster analysis
Abstract: A case study of semantic clustering of scientific articles related to Rough Sets is presented. The proposed method groups the documents on the basis of their content and with assistance of DBpedia knowledge base. The text corpus is first treated with Natural Language Processing tools in order to produce vector representations of the content and then matched against a collection of concepts retrieved from DBpedia. As a result, a new representation is constructed that better reflects the semantics of the texts. With this new representation, the documents are hierarchically clustered in order to form partition of papers that share semantic relatedness. The steps in textual data preparation, utilization of DBpedia and clustering are explained and illustrated with experimental results. Assessment of clustering quality by human experts and by comparison to traditional approach is presented.
Published: 2012

48. Interactive Document Indexing Method Based on Explicit Semantic Analysis

Author: Andrzej Janusz, Adam Krasuski, Wojciech Świeboda, and Hung Son Nguyen
Subjects: Information retrieval, business.industry, Computer science, media_common.quotation_subject, Search engine indexing, Semantic search, computer.software_genre, Interactive Learning, Index (publishing), Knowledge base, Explicit semantic analysis, Quality (business), Artificial intelligence, business, computer, Interpreter, Natural language processing, media_common
Abstract: In this article we propose a general framework incorporating semantic indexing and search of texts within scientific document repositories. In our approach, a semantic interpreter, which can be seen as a tool for automatic tagging of textual data, is interactively updated based on feedback from the users, in order to improve quality of the tags that it produces. In our experiments, we index our document corpus using the Explicit Semantic Analysis (ESA) method. In this algorithm, an external knowledge base is used to measure relatedness between words and concepts, and those assessments are utilized to assign meaningful concepts to given texts. In the paper, we explain how the weights expressing relations between particular words and concepts can be improved by interaction with users or by employment of expert knowledge. We also present some results of experiments on a document corpus acquired from the PubMed Central repository to show feasibility of our approach.
Published: 2012

49. Dynamic Rule-Based Similarity Model for DNA Microarray Data

Author: Andrzej Janusz
Subjects: Reduct, Relation (database), business.industry, Similarity heuristic, Pattern recognition, computer.software_genre, Similitude, Semantic similarity, Similarity (network science), Feature (machine learning), Artificial intelligence, Rough set, Data mining, business, computer, Mathematics
Abstract: Rules-based Similarity (RBS) is a framework in which concepts from rough set theory are used for learning a similarity relation from data. This paper presents an extension of RBS called Dynamic Rules-based Similarity model (DRBS) which is designed to boost the quality of the learned relation in case of highly dimensional data. Rules-based Similarity utilizes a notion of a reduct to construct new features which can be interpreted as important aspects of a similarity in the classification context. Having defined such features it is possible to utilize the idea of Tversky's feature contrast similarity model in order to design an accurate and psychologically plausible similarity relation for a given domain of objects. DRBS tries to incorporate a broader array of aspects of the similarity into the model by constructing many heterogeneous sets of features from multiple decision reducts. To ensure diversity, the reducts are computed on random subsets of objects and attributes. This approach is particularly well-suited for dealing with "few-objects-many-attributes" problem, such as mining of DNA microarray data. The induced similarity relation and the resulting similarity function can be used to perform an accurate classification of previously unseen objects in a case-based fashion. Experiments, whose results are also presented in the paper, show that the proposed model can successfully compete with other state-of-the-art algorithms such as Random Forest or SVM.
Published: 2012

50. JRS’2012 Data Mining Competition: Topical Classification of Biomedical Research Papers

Author: Hung Son Nguyen, Sebastian Stawicki, Adam Krasuski, Andrzej Janusz, and Dominik Ślęzak
Subjects: Multi-label classification, Information retrieval, Scope (project management), Test data generation, Computer science, computer.software_genre, CONTEST, Data science, Task (project management), Competition (economics), Explicit semantic analysis, Scalability, Data mining, GeneralLiterature_REFERENCE(e.g.,dictionaries,encyclopedias,glossaries), computer
Abstract: We summarize the JRS’2012 Data Mining Competition on “Topical Classification of Biomedical Research Papers”, held between January 2, 2012 and March 30, 2012 as an interactive on-line contest hosted on the TunedIT platform ( http://tunedit.org ). We present the scope and background of the challenge task, the evaluation procedure, the progress, and the results. We also present a scalable method for the contest data generation from biomedical research papers.
Published: 2012

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Journal

Database

Publisher

63 results on '"Andrzej Janusz"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources