Right now I´m trying to find out all equally named nodes for two different XMLs, saving the found node names inside a StringList and use this list for the user to choose between these nodes.
The idea behind all of this is simple: When I try to overload a section, I now want two big XMLs to switch certain sections, that both of them have(at least equally named sections). The user can choose the section he wants to switch from a Combobox. To prevent mistakes, I figured out, I would compare all nodes first and offer the user just the equally named sections, found in both XMLs, so I prevent the user from trying to switch the <info>-section from XML A with the <person>-section in XML B.
So far,so good. It works just fine, but I have a big performance problem:If both XMLs have more than 300 nodes each, comparing them still works fine, but it takes a lot of time before the program continues its work. The user GUI is frozen until the node comparing process is done. Now, I worked around it so far by warning the user, if a certain number of nodes is found in both documents and I let him type in the desired node name manually. (of course I then check if both documents have this node ;) )
But is there an easier/better way to find out all equally named sections for two different XMLs?
Here´s my code(which works just fine except the performance for bigger XMLs):
I use this method to create two ArrayLists holding ALL nodes of the corresponding XML file.
Afterwards, I compare the two ArrayLists just like this:
And that´s exactly where the performance problem lies. Does anyone have a suggestion how to better the performance issue?
This was interesting to me, because intuitively the original 2 'for-loops' and the 'for-loop' with List.contains() should operate at about the same speed. Ignoring the case-sensitivity issue for a moment, ArrayList.contains() does what the inner loop does, comparing each item in List2 to the current item in List1 until a match is found. It seems like there's some other side-effect causing the noticeable performance hit. In either case, it seems to me that this search should operate in O(n^2).
Having said that, be sure that the change
from myStringList.get(x).equalsIgnoreCase(myReplaceStringList.get(y)) to myReplaceStringList.contains(myStringList.get(x)) does what you want it to do. List.contains(Object o) is defined to compare using o.equals(), which will not ignore case for String objects.
To keep the case-insensitivity (and potentially improve performance), you could use a HashSet of upper-cased items and test for membership like this:
This method should operate in O(n). For smaller numbers of items, it might not be worth it to calculate all the hash codes. However, if your application needs to scale, it might be worth it to make a test with lists of size 10^6 strings.
Joined: May 31, 2011
Good thought Drew. Welcome (back) to Ranch
subject: Performance Problem Searching Equally Named Nodes In Two XMLs