How to check whether two trees have identical structure ?
Joined: May 02, 2005
I am trying to figure out a way to check whether two multiway trees are identical or not in java. When i say identical, i mean whether two trees have the same physical layout ( given that rearranging the nodes is allowed )..
I dont care abt the values of each node in the tree and am concerned only abt finding whether two trees have identical structure. Can anyone suggest me what data structure I can use to store the above data so that I can later compare them to see whether they are identical.
What i am planning to do is to convert each tree into some numeric values and then compare them. For instance..
|-- D | A--|-- C | |-- B can be converted to 13000 [ 1 stands for first node... 3 stands for 3 child node of first node.. followed by 3 zeros since none of the child nodes ( D, C , B) have further branches.
But my algorithm is coming to be little complex when tree becomes complex. One tree is getting mapped to too many numeric values and hence comparision is becoming tedious. Any suggestions/pointer/help would be appreciated.
Do you know how to do a recursive walk through a tree? If each node has right & left children it might look like:
Or do your nodes have collections of children? You showed one with three.
What you need is some way to track the progress through the tree. Let's see if we can make visit() return something ...
Try that with pencil & paper on some of your sample trees and see if it gives a string representation of the structure with none of the content.
Does that give you something to try?
A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi
Joined: May 02, 2005
Really appreciate your quick reply..I totally understand what you are trying to hint here through your code..
But I think I didn't make my point across properly in my post..I am quite comfortable with the tree parsing stuff and infact am confident of converting any idea into code.. But what i am looking for is some good idea that would accomplish my task of mapping the tree into some output String or number.. Let me tell what i am trying to achieve here by comparing two trees.. I am trying to dynamically come up with a flowmap ( an jpeg image ) that consists of number of blocks with lines connecting them just like the one i had drawn as sample. And the text on each block of jpeg image is editable and hence can be changed run time.
I will be forming a tree structure during run time basen on some run time parameters and once tree is formed i need to select a flowmap image ( from the bunch i have ) that represent the tree struture i formed..
Now, in the run time I can form any of the following tree structure
but since the flow ( represented by tree ) is similar for all the above trees, i should be selecting the same flowmap image for all the above.. In other words all the above trees should be mapped to same flowmap image. I cant afford to have three flowmap images for the same flow in the server. Now my question is how can i do the mapping such that all 3 gives same output string that I can map to some flowmap image ( and ofcourse no other tree structure should generate this output string ) ..One of the mapping that i can think off top of my head now for sample tree 3 is assigning
1**1 ( 1 to the power of 1 ) to A = 1
2**1 to B = 2
2**2 to C = 4
2**3 to D = 8
3**1 to E = 3
3**2 to F = 9
3**1 to G = 3
3**2 to H = 9
So the total for graph 3 comes to 39 .. Same way sample 1) and 2) will give a count of 39 which i can map to a particular flowmap image..
But the mapping i had used above wont work for all the scenarios and there may be a possibility that totally different tree can give the same output number 39. So i am looking for some good mapping alogrithm that will generate same unique output string for similar trees and different for non-similar trees.
I think i should involve some prime numbers for mapping.what do you say?
Thanks in advance..
[ May 16, 2005: Message edited by: Sanjay Murugesan ] [ May 16, 2005: Message edited by: Sanjay Murugesan ]
Joined: Jan 29, 2003
The trace I sketched above would be ambiguous. You could uniquely name each node by the path you took to get there ... left, middle or right branch. Your last graph would come out
Um, Stephan, define similar and identical? We might or might not agree here.
I think the looonnnggg and short solutions I suggested will both find identical structure, ie each corresponding node has the same number of children or the same right & left branches, while ignoring the content completely. Was that the question?
Joined: Feb 18, 2005
Originally posted by Stan James: Was that the question?
No, the task was to find trees that are identical even if you ignore left versus right.
Case 1: Obviously if your trees only have one node each, they're "similar".
Case 2: If one tree has a parent node and a child to the left whiloe the second tree has a parent and a child to the right, they are "similar" but not "identical".
I think Sanjay is looking for a way to determine if two trees are similar.
I think Geoffrey hit the nail on the head with his "canonical form" suggestion. (And the "margin" comment was beautiful.)
EDIT: Wait, I think a modification of some of Stan's earlier code should do the trick, at least for binary trees:
To extend this to N-ary trees, you need some way to compare all combinations of children between nodeA and nodeB. i.e. You need to be able to find a 1-1 pairing between children of nodeA and children of nodeB where each pair returns true for visit(). If there is such a pairing, then visit(nodeA, nodeB) is true.
Ryan [ May 18, 2005: Message edited by: Ryan McGuire ]
@Stan: Ryan explained what I meant and what was mentioned in the thread-opening.
btw.: my name is Stefan, though Stephan is somewhat similar
Joined: Jan 29, 2003
Woops, sorry about the name. I scrolled back to look and must have changed the spelling in my head while scrolling back. And I really did miss the question. Rats.
My recursive method could be adapted to not care whether it goes right or left. I think it would have to use brute force to try every possible path through the tree. Comparing A's child-1 to B's child-1 might fail while comparing A's child-1 to B's child-2 would match.
I'm stumped on what the canonical description of the flow would look like to make the three samples given near the top work out. Maybe you could walk the tree once and annotate each node with the total number of children that can be reached from it, then convert them to text in that order? [ May 19, 2005: Message edited by: Stan James ]
Joined: Jan 29, 2003
Ok, build a list of paths to the leaves. The path is the number of children for each node along the way. The three trees in the sample at the top yield the same list:
3 3.2 3.2 3.2 3.2
Of course they add the entries in different orders. We have to sort them before comparing the lists. If you can have more than 9 children, pad the numbers with leading zeros to make the strings sort right.
Here's a node that makes such a list. It was a few minutes work; it puts a zero on the end for its own number of children (that's a feature for one-node trees) and doesn't put in the dots. I'll leave that to the reader along with sorting and comparing the two lists.
I used insertion-order-preserving linked lists and sets to facilitate unittesting. Some kind of sorting List for the results would be a nice feature. A sorting Set is right out becuse it doesn't allow duplicates. [ May 19, 2005: Message edited by: Stan James ]
Joined: Feb 21, 2002
You could put the result in a hash table where the path signature/pattern is the key and the value is the number of times the path signature/pattern occured in the tree. This could be put into an class where you override the the equals method to allow/encapsulate the comparison.
...or if you want to make it really fast you could build a master path pattern for the entire tree by sorting the path signatures by their occurence (primary) and then by their signature pattern itself (secondary). If you then built a string out of them in this order, you have what I think would be a decent key that you could then store in a database with your flows, images etc. Then you could evaluate a situation, build it's key and then use the power of the database to find various alternatives.