Back to Search Start Over

Canonical forms for labelled trees and their applications in frequent subtree mining

Authors :
Yun Chi
Richard R. Muntz
Yirong Yang
Source :
Knowledge and Information Systems. 8:203-234
Publication Year :
2005
Publisher :
Springer Science and Business Media LLC, 2005.

Abstract

Tree structures are used extensively in domains such as computational biology, pattern recognition, XML databases, computer networks, and so on. In this paper, we first present two canonical forms for labelled rooted unordered trees–the breadth-first canonical form (BFCF) and the depth-first canonical form (DFCF). Then the canonical forms are applied to the frequent subtree mining problem. Based on the BFCF, we develop a vertical mining algorithm, RootedTreeMiner, to discover all frequently occurring subtrees in a database of labelled rooted unordered trees. The RootedTreeMiner algorithm uses an enumeration tree to enumerate all (frequent) labelled rooted unordered subtrees. Next, we extend the definition of the DFCF to labelled free trees and present an Apriori-like algorithm, FreeTreeMiner, to discover all frequently occurring subtrees in a database of labelled free trees. Finally, we study the performance and the scalability of our algorithms through extensive experiments based on both synthetic data and datasets from real applications.

Details

ISSN :
02193116 and 02191377
Volume :
8
Database :
OpenAIRE
Journal :
Knowledge and Information Systems
Accession number :
edsair.doi...........e7b2aebe69216865268b668c5724a2ba
Full Text :
https://doi.org/10.1007/s10115-004-0180-7