• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Best option for XML mapping w.r.t performance

 
Ranch Foreman
Posts: 275
jQuery Eclipse IDE Spring
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi

I have an application where XML contains data which is to be accessed frequently.
The current implementation uses XPATH and dom. ( I assume so because an org.w3c.dom.Document object for each XML and data is read using XPATH classes from it ).
This process takes time.
Is there a better way to do this.
I have worked with castor objects earlier and find it convenient. But am not sure about its relative performance.
Please put some light on it.

Thanks in advance
 
Aniruddh Joshi
Ranch Foreman
Posts: 275
jQuery Eclipse IDE Spring
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Resolved. I read this comparisn report
 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I wouldn't be so sure, that report is about 8 years old - perhaps 3 generations of XML toolkits ago.

Thats even before "pull" parsers got started.

XPath is very slow in general - not surprising considering all the work it has to do.
 
Ranch Hand
Posts: 70
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have been facing an issue where I have a huge schema and my requirement would be best suited by XML Binding API (preferably Castor) where I would directly unmarshal the required Entities to consume them in my application.

Since, the schema is huge, the classes generated by the Castor Source Gen exceeds the 5000 count ultimately leading to OutOfMemory (-Xms1024m -Xmx1024m). I am not sure whether the count is an issue here but I definitely do not need most of the unmarshalled types - since Castor does not have any direct way of retrieving the required types (along with the dependant types), I do not have control over the number of types unmarshalled and am a little overwhelmed counting my options here.

If anyone as a better suggestion over the problem, or something which I had overlooked in Castor API - it would be of great help.

PS: Castor 1.3

- Thanks
 
William Brogden
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I am not understanding why you are considering a mapping to Java classes at all. The answers to the following might help:

Out of all this huge XML schema, exactly how much data do you need?

Are you are only going to pluck out a few values?

Do you have to modify the source XML to create a modified document?

Bill
 
Nitin Pathak
Ranch Hand
Posts: 70
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks Bill! The mapping to Java classes is essentially to provide a reuseable solution to the clients - since schema being used is an Industry standard and has an exhaustive set of types to provide required XML richness.

To answer your questions:

Out of all this huge XML schema, exactly how much data do you need?
Currently, I need nearly 10% of the data - however since to be able to provide a reuseable solution, it should be able to read the exhaustive set of types from the schema as the application grows.

Are you are only going to pluck out a few values?
For now, yes! Though, it would be helpful to be able to read the entire set of type, for now I would go for a few.

Do you have to modify the source XML to create a modified document?
No. Source XML will not be modified and needs to be validated against the schema (maybe Xerces validation) before I (plan to) use the unmarshalled Java object.

Tried JAXB with a few custom bindings resulting in the same memory exceptions. I would really appreciate your response.

- Thanks
 
William Brogden
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
In view of those requirements, keeping the DOM in memory and using XPath still looks like the simplest and most flexible.

I have done a lot of hard coding for fast lookup of values in a DOM - essentially going directly to what XPath has to do indirectly.

It is orders of magnitude faster but could lead to hard to understand and maintain in your case.

If this was my problem I would try to do some pre-processing to locate DOM Elements which can be characterized and used as starting points for XPath so that your XPath expression does not have to search the entire DOM but only potentially useful Elements.

Bill
 
Nitin Pathak
Ranch Hand
Posts: 70
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks Bill! That helps a lot to initially start with. I plan to read the relevant XPATHs from a properties file (would later consider to derive these) and keep the XML validation to a separate (first) level using Xerces which would avoid data binding in this case.

Also, just wanted to inform that OutOfMemory seems to be resolved after I use XMLBeans - by separating the Source Generation and Compilation, but does not seem to be a foolproof a and stable solution.

- Thanks
 
reply
    Bookmark Topic Watch Topic
  • New Topic