Two Laptop Bag*
The moose likes XML and Related Technologies and the fly likes Remove multiple occurences of XML nodes Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "Remove multiple occurences of XML nodes" Watch "Remove multiple occurences of XML nodes" New topic
Author

Remove multiple occurences of XML nodes

Tausif Farooqi
Greenhorn

Joined: Mar 03, 2007
Posts: 10
I need to write a Java method that removes multiple occurences of a node (and its contents) from within an XML (supplied as a String).

Here's a sample of the XML



I need to remove all occurences of the element "OLifEExtension" and its contents. I've written a fairly simple method given below, it works but it is very inefficient and takes a lot of time if the XML is large (>=10 MB)



I've also tried regular expressions but can't figure one that works. I've tried the following:

1. <OLifEExtension[^>]+>.+?</OLifEExtension>
2. <OLifEExtension .*?>.*?</OLifEExtension>
3. <OLifEExtension[^>]+/>|<OLifEExtension[^>]+>.+</OLifEExtension>

None of the above regular expressions work. Instead of matching the first "OLifEExtension" element, it matches everything between the first opening "OLifEExtension" and the last ending "OLifEExtension" tag.

Can anyone please tell me a more efficient way of doing this or kindly provide me with a regular expression that will do the job for me?

Many many thanks in advance.
[ December 14, 2008: Message edited by: Tausif Farooqi ]
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12761
    
    5
Probably the reason it is slow is you modify the whole string every cycle which makes for lots of large object creation.

If the XML is really formatted that regularly you could read it line by line (see java.io.BufferedReader and StringReader, writing to an output java.io.StringWriter but skipping the lines between the start and end tags.

Bill
Tausif Farooqi
Greenhorn

Joined: Mar 03, 2007
Posts: 10
Thanks for the suggestion Bill, but the problem is that I can't assume that the XML will be properly formatted as its coming from an external source. I can try putting line breaks between every adjecent ">" and "<" and try what you've suggested and see if it makes a difference.
Tausif Farooqi
Greenhorn

Joined: Mar 03, 2007
Posts: 10
Hi Bill, you were right about the String contatenation part! I changed the method to this:

And it runs nearly 400 times faster than the previous method! Thanks for the help!
[ December 14, 2008: Message edited by: Tausif Farooqi ]
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: Remove multiple occurences of XML nodes
 
Similar Threads
Java Web Services 2
Exception in parsing XML file
looping inside the child elements instead of using xsl:for-each
Regex matching
ArrayList Problem while parsing XML...