File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes XML and Related Technologies and the fly likes XML parsers to garbage collect the part of the xml which is parsed Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Soft Skills this week in the Jobs Discussion forum!
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "XML parsers to garbage collect the part of the xml which is parsed" Watch "XML parsers to garbage collect the part of the xml which is parsed" New topic
Author

XML parsers to garbage collect the part of the xml which is parsed

Pratap koritala
Ranch Hand

Joined: Sep 27, 2006
Posts: 252
Is it possible to Garbage Collect the part of XML which is already parsed, using existing XML parsers ?

Consider Following Use case,
I am having a JVM with limited Heap(say 100 Mb, set by command line options).I am having a large XML document in memory, say 75 Mb of size in the form of a string or stringbuffer or stringbuilder reference.
Now I need to parse the document ( Either Dom or Sax) and build a complex Java Object( can have collections) out of it. Also, Once I built the java Object, XML document is not needed.

But, I cannot do that as I start parsing the document, I'll allocate more and more memory for the java object in the course of parsing and I'll reach Heap upper limit causing OutOfMemory. Because I am having the entire XML document also in memory.

If it is like after parsing every 5 Mb of XML,If that portion of XML document which I parsed can be Garbage collected, it can leave more space(Memory) for the java object.


Any Help would be highly appreciated.
Pratap koritala
Ranch Hand

Joined: Sep 27, 2006
Posts: 252
Following Inputstream would be passed to SAX Parser, I think it will work.

Please comment if you see anymistake.
Also, Let me know if there are better mecahnisms.

William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12835
    
    5
If this was my problem I would certainly consider writing the document to a temporary file.

If you use the java.io.File.createTempFile() method it takes care of finding a location, and the deleteOnExit() should do the cleanup for you.

Furthermore, I'm sure StringBuilder delete is not going to immediately free up memory, just changes internal pointers.

Bill
Pratap koritala
Ranch Hand

Joined: Sep 27, 2006
Posts: 252
If this was my problem I would certainly consider writing the document to a temporary file.

yes,It was one of the things that first came to mind. But, I wanted to do with memory for performance reasons.


Furthermore, I'm sure StringBuilder delete is not going to immediately free up memory, just changes internal pointers.


I've checked with Following with limit on heap( On Hotspot VM, Not sure how other VMs will behave),
It seemed to freeup, atleast when it reaches the heap upper limit


If you use the java.io.File.createTempFile() method it takes care of finding a location, and the deleteOnExit() should do the cleanup for you.

I'm not aware of the API deleteOnExit(), Thanks for letting me know.
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12835
    
    5

Ah yesssss here is the code from the parent class java.lang.AbstractStringBuilder - looks like it just moves characters within the existing buffer and adjusts the count so the memory used for the buffer should stay the same.



Bill
Jimmy Clark
Ranch Hand

Joined: Apr 16, 2008
Posts: 2187
But, I cannot do that as I start parsing the document, I'll allocate more and more memory for the java object in the course of parsing and I'll reach Heap upper limit causing OutOfMemory. Because I am having the entire XML document also in memory.


If you process the XML-based document with a SAX implementation, you will never have the "entire XML document in memory."


If you hit a memory problem with temporary data objects that are created from the XML data, then you need to identify some point in the processing to "do something" with the temporary data objects, get rid of them, create new ones and so on so forth. This can only be done with SAX-based implementation and some type of permanent storage mechanism, e.g. file, relational database.
Pratap koritala
Ranch Hand

Joined: Sep 27, 2006
Posts: 252
If you process the XML-based document with a SAX implementation, you will never have the "entire XML document in memory."

True, If you've the stream, So neither the document is entirely in memory nor SAX will bring entirely it to memory.

But it doesn't apply to parsing the XML document which is already in memory, and Getting rid part of the document which is alread parsed.

Pratap koritala
Ranch Hand

Joined: Sep 27, 2006
Posts: 252
The motivation for this question is following,

In Apache Axis, consider following scenarios

Web Service Inovoked( Server recieved a request from Client)
Client Made webservice call with a large object as parameter, Now Axis will need to convert the request( XML)
to java Type.
Instead of parsing the Java types from the stream, It seemed to get whole everything into memory then started to
parse it. I cannot afford this as memory is a big constraint.


Making webservice request( Client making request to server)
I am making webservice call with large Object as parameter.So I am having this so called large object in memory,
And apache is trying to build XML string in-memory instead of building part of it and streaming it to server.


Can I configure Apache Axis for this type of parsing
Is there any available libraries to do this ?



Also, please check my other posts too
http://www.coderanch.com/t/528622/Web-Services/java/Axis-Parser
http://www.coderanch.com/t/528569/Web-Services/java/client-response-parsing-behaviour
Jimmy Clark
Ranch Hand

Joined: Apr 16, 2008
Posts: 2187
XML parsing in general comes with overhead. It sounds like this SOAP-based web service is not designed efficiently and your memory problem stems from attempting to send a single large object as a web service argument. Maybe the decision to use a web service to integrate these applications was not the best. Other alternatives do exist, e.g. JMS and messaging, FTP, etc.

If you could redesign the web service to take smaller pieces of data, you could stream the data to the receiving application with multiple web service calls instead of one giant 70 MB call. This is a much better option than fiddling around with JVM settings or awkward code, or attempting to interfere with Axis Engine processing.

Aside, Axis is open-source, so you are free to create your own version of Axis.
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12835
    
    5
This is obviously a job for SAAJ - SOAP with Attachments API for Java.

The gross inefficiencies of moving large chunks of data as XML inside the SOAP body motivated the creation of SAAJ.

An attachment can in fact be a serialized Java object.

Bill
Pratap koritala
Ranch Hand

Joined: Sep 27, 2006
Posts: 252
But,Isn't SAAJ supposed to be used for attachments like Binaries like Image, and Large XML files.
In this case, Web service method parameters are Huge Objects.We can turn those parameters into attachments,
serialize and deserialize and then do actual business logic.
But I think this is kind of tied to JAVA only implementation, and very much implementation specific.
Also, We need to have implementation for every method that need to be exposed as Webservice.

Is there any other better alternative, Can we do this requirement jax-ws RPC mechanism ?


William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12835
    
    5
Lets back up a bit. This is the first time you mentioned concern about JAVA only implementation.

EXACTLY what do your clients require?

If this was my project I would probably be filling a Whiteboard with cloud symbols and arrows and other abstract stuff until I was sure I knew exactly what the clients wanted.

Bill
Pratap koritala
Ranch Hand

Joined: Sep 27, 2006
Posts: 252
This is the first time you mentioned concern about JAVA only implementation.


I meant by doing SAAJ way, I'll end up attachment for every parameter.
That way, It is more like using HTTP for communication rather than actually using SOAP( of course on top of HTTP too).
So, It is like I am using Webservice just because I wanted to it on HTTP, To which there exists better alternatives.

Lets back up a bit.

It's because I posted with findings of my research into this. So, It ended up here.

It is more of my intrest to do in proper way, not some specific client need.

Anyway, I'll try out every possible way for that, I'll let you guys know of result
Thanks for the help, I appreciate it.

 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: XML parsers to garbage collect the part of the xml which is parsed