aspose file tools*
The moose likes XML and Related Technologies and the fly likes Performance Problem Searching Equally Named Nodes In Two XMLs Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "Performance Problem Searching Equally Named Nodes In Two XMLs" Watch "Performance Problem Searching Equally Named Nodes In Two XMLs" New topic
Author

Performance Problem Searching Equally Named Nodes In Two XMLs

Randy Miller
Ranch Hand

Joined: Feb 13, 2012
Posts: 44
Hi guys!

Right now I´m trying to find out all equally named nodes for two different XMLs, saving the found node names inside a StringList and use this list for the user to choose between these nodes.
The idea behind all of this is simple: When I try to overload a section, I now want two big XMLs to switch certain sections, that both of them have(at least equally named sections). The user can choose the section he wants to switch from a Combobox. To prevent mistakes, I figured out, I would compare all nodes first and offer the user just the equally named sections, found in both XMLs, so I prevent the user from trying to switch the <info>-section from XML A with the <person>-section in XML B.
So far,so good. It works just fine, but I have a big performance problem:If both XMLs have more than 300 nodes each, comparing them still works fine, but it takes a lot of time before the program continues its work. The user GUI is frozen until the node comparing process is done. Now, I worked around it so far by warning the user, if a certain number of nodes is found in both documents and I let him type in the desired node name manually. (of course I then check if both documents have this node ;) )
But is there an easier/better way to find out all equally named sections for two different XMLs?
Here´s my code(which works just fine except the performance for bigger XMLs):



I use this method to create two ArrayLists holding ALL nodes of the corresponding XML file.
Afterwards, I compare the two ArrayLists just like this:



And that´s exactly where the performance problem lies. Does anyone have a suggestion how to better the performance issue?

Greetings,

Randy
John Jai
Bartender

Joined: May 31, 2011
Posts: 1776
What exactly does this piece of code does?



Seems you just create an ArrayList, add items to it and abandon it after the loop terminates.
Randy Miller
Ranch Hand

Joined: Feb 13, 2012
Posts: 44
Hi John!

Sorry, my mistake ;)
Copy&Paste
Just fixed the code block and copied the correct code up above!

Of course, the for loop you saw there before, was completly irrelevant ;)
John Jai
Bartender

Joined: May 31, 2011
Posts: 1776
You can use the contains() method in List rather than iterating the second List.

Sample :


Check where the performance issue occurs, in parsing the DOM or checking equal nodes.
Randy Miller
Ranch Hand

Joined: Feb 13, 2012
Posts: 44
Wow! Didn´t think it was that easy
My performance is now through the roof
Thank you John for your code snippet! WOrked perfectly fine!

Here´s the winner:

Drew Liscomb
Greenhorn

Joined: Sep 20, 2010
Posts: 7
This was interesting to me, because intuitively the original 2 'for-loops' and the 'for-loop' with List.contains() should operate at about the same speed. Ignoring the case-sensitivity issue for a moment, ArrayList.contains() does what the inner loop does, comparing each item in List2 to the current item in List1 until a match is found. It seems like there's some other side-effect causing the noticeable performance hit. In either case, it seems to me that this search should operate in O(n^2).

Having said that, be sure that the change
from myStringList.get(x).equalsIgnoreCase(myReplaceStringList.get(y))
to myReplaceStringList.contains(myStringList.get(x))
does what you want it to do. List.contains(Object o) is defined to compare using o.equals(), which will not ignore case for String objects.

To keep the case-insensitivity (and potentially improve performance), you could use a HashSet of upper-cased items and test for membership like this:


This method should operate in O(n). For smaller numbers of items, it might not be worth it to calculate all the hash codes. However, if your application needs to scale, it might be worth it to make a test with lists of size 10^6 strings.

Kind regards,
Drew
John Jai
Bartender

Joined: May 31, 2011
Posts: 1776
Good thought Drew. Welcome (back) to Ranch
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Performance Problem Searching Equally Named Nodes In Two XMLs
 
Similar Threads
need help
Overwriting a specific XML section
Updating XML
Save XML nodes in a String array for ChooseNodes
Huffman encoding/Decoding