• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

need to break a huge xml into smaller groups one by one without loading the whole xml

 
Tanveer Rameez
Ranch Hand
Posts: 158
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,
I have a xml file with the following lines:

Now this xml date is very huge, i.e. there are 100s(maybe 1000s) of hotel elements . i.e: <hotel>....</hotel>. I want to process the hotel elements but loading the whole xml data(of all the hotels) will require huge memory. So what I want to do is: extract only one <hotel> elements i.e. all data between <hotel>..</hotels> , send it to the output stream. If the output stream is full, i will wait till it becomes empty before sending the date to the output stream.After that i obtain the next <hotel> element and so on. This will prevent loading all the <hotel> elements in the memory. The output stream will be piped to an input to another class which will process the data for a hotel. So the other class is to get data for one hotel in its inputstream.
I could find 3 ways of doing this:
1.DOM: but I cannot use java Dom api because it loads the entire xml data into memory.
2. SAX: Now if I use java SAX api, it means I have to recreate the entire <hotel>..</hotel>. Plus I want to control when i want to recieve the events. Sax api will fire the events at its will when it parses, not ours. . Note that I have to split the xml data and pass it to a stream, so if I can get the <hotel>..</hotel> data without much processing, it will be good.
3. use Xml pull parsing like MXP http://www.extreme.indiana.edu/xgws/xsoap/xpp/mxp1/index.html )" target="_new" rel="nofollow">(http://www.extreme.indiana.edu/xgws/xsoap/xpp/mxp1/index.html )
This api allows me to control when i want to get the next event:

So wheneverm i encounter a start tag <hotel> , i put all the data following that tag to the output stream till i encounter the end tag </hotel>. then i send the output stream to another class, and wait till the output stream is empty before doing the process again for the next hotel. But the problem is I have to recreate the entire xml data between <hotel> and </hotel> before sending it to the outputstream.
I know I may sound confusing, but I am not an expert in xml and java xml api. Inshort i have an input stream of a huge xml data, and I want to break it up into smaller sub data(based on tag) and pass it to the output stream.
and I want to do this step by step..obtain first sub data and send it, wait till the output stream is empty and then obtain the next sub data and continue. Extracting the sub data one by one prevents loading of the whole data into the memory.
Please help!!! If you knwo any other way other than the 3 ways I wrtoe above, please suggest.
Thanks in advance
Tanveer
 
Lasse Koskela
author
Sheriff
Posts: 11962
5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Would it be acceptable to have the output stream block the parsing thread?
 
Tanveer Rameez
Ranch Hand
Posts: 158
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Lasse Koskela:
Would it be acceptable to have the output stream block the parsing thread?


Well, If that can be done, YES.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic