1. Applying mutual information for discretization to support the discovery of rare-unusual association rule in cerebrovascular examination dataset.
- Author
-
Wulandari, Chandrawati Putri, Ou-Yang, Chao, and Wang, Han-Cheng
- Subjects
- *
DISCRETIZATION methods , *DATA mining , *MEDICAL records , *DATA extraction , *STROKE - Abstract
Highlights • A discretization approach based on mutual information concept was proposed. • This research aims to extract RUARs in accordance with discretization result. • A large dataset with about 12,000 medical data records is used to generate RUARs. • Our MID method provides a high coverage ratio of RUARs. • Graph-based visualization is used to represent RUARs clearly and easily. ABSTRACT In knowledge discovery studies, association rules mining has been extensively studied to discover hidden knowledge and relationships among set of items in a transactional dataset. Most research on association rule mining focuses on discovering frequent patterns based on the most frequent items occurring in the dataset. However, the process of extracting rare rules has received less attention. In medical dataset studies, the discovery of rare association rules (RARs) is more challenging, because it could likely be used to obtain more potentially rare and unusual knowledge for physicians, beside frequent association rules. Hence, the aim of this paper is to discover non-frequent or rare-unusual association rules (RUARs) from a stroke medical dataset to provide potential meaningful knowledge to the user domain. A discretization method needs to be performed as the data preprocessing step before generating rules. To the best of our knowledge, fewer studies have focused on the role of discretization results to support the extraction of a better amount and quality of RUARs, particularly for medical datasets. In addition, the extracted RUARs is expected to provide potential new unusual insights on stroke risk patterns. This paper applies mutual information measure to discretize a stroke examination dataset collected from a medical center in Taiwan. The interval merging method was proposed to simplify the discrete form and enrich the quality of generated rules. Towards the end, rare association rules, with relatively low support, were generated by employing the Apriori-Rare method accordingly. In addition, a filtering process was applied to the content of the rule itemsets to discover the expected set of RUARs for physicians. Furthermore, the extracted RUARs was analyzed based on the relative risk values toward the occurrence of stroke. Results indicated that the mutual information discretization outperformed the traditional discretization methods in terms of how the discretization scheme can support the extraction of RUARs with a better quantity and quality measurements for further analysis purpose in medical point of view. Moreover, the proposed method had a relatively higher number of RUARs. The knowledge of unusual rule patterns from rare association rules might provide potential new and unusual insights for medical pratitioners and increase the awareness of stroke examination results. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF