Abstract:
|
A simple and efficient algorithm for an extension of the subtree isomorphism
problem is presented that computes a certificate for each rooted subtree of a
given forest, thereby partitioning the set of rooted subtrees into
isomorphism equivalence classes. The partitioning can be used to find all
occurrences of a pattern tree in a text tree, or even all occurrences of
every subtree of a pattern in a text, and the algorithm handles multiple pattern trees and also multiple
text trees. The method combines a bottom-up forest traversal algorithm with a
simple numbering scheme for rooted trees. The algorithm runs in expected time
linear in the number of nodes, and can be applied to rooted trees of
unbounded degree, either unordered or ordered, labeled or unlabeled. The
algorithm also solves the problem of finding all $k$-th largest common
subtrees and all $k$-th most often repeated subtrees. A C++ implementation of
the algorithm using LEDA is given in full detail. |