Back to Search Start Over

Optimal Subgroup Discovery in Purely Numerical Data

Authors :
Rémy Cazabet
Jean-François Boulicaut
Alexandre Millot
Data Mining and Machine Learning (DM2L)
Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS)
Institut National des Sciences Appliquées de Lyon (INSA Lyon)
Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL)
Université de Lyon-École Centrale de Lyon (ECL)
Université de Lyon-Université Lumière - Lyon 2 (UL2)-Institut National des Sciences Appliquées de Lyon (INSA Lyon)
Université de Lyon-Université Lumière - Lyon 2 (UL2)
Université Lumière - Lyon 2 (UL2)-École Centrale de Lyon (ECL)
Université de Lyon-Université de Lyon-Université Claude Bernard Lyon 1 (UCBL)
Université de Lyon-Centre National de la Recherche Scientifique (CNRS)-Institut National des Sciences Appliquées de Lyon (INSA Lyon)
Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université Lumière - Lyon 2 (UL2)-École Centrale de Lyon (ECL)
Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)
Source :
Advances in Knowledge Discovery and Data Mining 24th Pacific-Asia Conference, PAKDD 2020, Singapore, May 11–14, 2020, Proceedings, Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), May 2020, Singapore (on line), Singapore. pp.112-124, ⟨10.1007/978-3-030-47436-2_9⟩, Advances in Knowledge Discovery and Data Mining, Advances in Knowledge Discovery and Data Mining ISBN: 9783030474355, PAKDD (2), Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), May 2020, Singapore, Singapore. pp.112-124
Publication Year :
2020
Publisher :
HAL CCSD, 2020.

Abstract

International audience; Subgroup discovery in labeled data is the task of discovering patterns in the description space of objects to find subsets of objects whose labels show an interesting distribution, for example the disproportionate representation of a label value. Discovering interesting subgroups in purely numerical data-attributes and target label-has received little attention so far. Existing methods make use of discretization methods that lead to a loss of information and suboptimal results. This is the case for the reference algorithm SD-Map*. We consider here the discovery of optimal subgroups according to an interestingness measure in purely numerical data. We leverage the concept of closed interval patterns and advanced enumeration and pruning techniques. The performances of our algorithm are studied empirically and its added-value w.r.t. SD-Map* is illustrated.

Details

Language :
English
ISBN :
978-3-030-47435-5
ISBNs :
9783030474355
Database :
OpenAIRE
Journal :
Advances in Knowledge Discovery and Data Mining 24th Pacific-Asia Conference, PAKDD 2020, Singapore, May 11–14, 2020, Proceedings, Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), May 2020, Singapore (on line), Singapore. pp.112-124, ⟨10.1007/978-3-030-47436-2_9⟩, Advances in Knowledge Discovery and Data Mining, Advances in Knowledge Discovery and Data Mining ISBN: 9783030474355, PAKDD (2), Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), May 2020, Singapore, Singapore. pp.112-124
Accession number :
edsair.doi.dedup.....23389d690a6f14dd9029316dfad56ad4
Full Text :
https://doi.org/10.1007/978-3-030-47436-2_9⟩