• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Remove multiple occurences of XML nodes

 
Greenhorn
Posts: 10
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I need to write a Java method that removes multiple occurences of a node (and its contents) from within an XML (supplied as a String).

Here's a sample of the XML



I need to remove all occurences of the element "OLifEExtension" and its contents. I've written a fairly simple method given below, it works but it is very inefficient and takes a lot of time if the XML is large (>=10 MB)



I've also tried regular expressions but can't figure one that works. I've tried the following:

1. <OLifEExtension[^>]+>.+?</OLifEExtension>
2. <OLifEExtension .*?>.*?</OLifEExtension>
3. <OLifEExtension[^>]+/>|<OLifEExtension[^>]+>.+</OLifEExtension>

None of the above regular expressions work. Instead of matching the first "OLifEExtension" element, it matches everything between the first opening "OLifEExtension" and the last ending "OLifEExtension" tag.

Can anyone please tell me a more efficient way of doing this or kindly provide me with a regular expression that will do the job for me?

Many many thanks in advance.
[ December 14, 2008: Message edited by: Tausif Farooqi ]
 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Probably the reason it is slow is you modify the whole string every cycle which makes for lots of large object creation.

If the XML is really formatted that regularly you could read it line by line (see java.io.BufferedReader and StringReader, writing to an output java.io.StringWriter but skipping the lines between the start and end tags.

Bill
 
Tausif Farooqi
Greenhorn
Posts: 10
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks for the suggestion Bill, but the problem is that I can't assume that the XML will be properly formatted as its coming from an external source. I can try putting line breaks between every adjecent ">" and "<" and try what you've suggested and see if it makes a difference.
 
Tausif Farooqi
Greenhorn
Posts: 10
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Bill, you were right about the String contatenation part! I changed the method to this:

And it runs nearly 400 times faster than the previous method! Thanks for the help!
[ December 14, 2008: Message edited by: Tausif Farooqi ]
 
Consider Paul's rocket mass heater.
reply
    Bookmark Topic Watch Topic
  • New Topic