wood burning stoves 2.0*
The moose likes XML and Related Technologies and the fly likes Dynamic translation of XML into CSV using XSD Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "Dynamic translation of XML into CSV using XSD" Watch "Dynamic translation of XML into CSV using XSD" New topic
Author

Dynamic translation of XML into CSV using XSD

Johnny Augustus
Greenhorn

Joined: Oct 07, 2006
Posts: 18
Hi,

I have a web application that needs to accept an XSD and a corresponding XML as input via form parameters.

I need to parse the XSD, identify the simple and complex types and other necessary information, and represent this in some sort of structural format (preferably a tree hierarchy)
This structural information will then be used to parse the corresponding XML file, record by record into a CSV file.
The XSD will be using only a limited set of XSD constructs (namespaces and imports can be ignored for the time being)

What is the best way that I can go forward with this?

Options considered:
XSD-Java binding tools (XMLBeans, JAXB).
I can get a type hierarchy using these tools.
countries
-country
-id
-name
-states
-state
-id
-name

This would be the ideal scenario as I can create a new instance hierarchy based on the above type hierarchy (acts as an intermediate in memory representation), populate the simple types (or attribute) with their respective values as I parse through the XML and write it to the file. However, due to the high number of class files that are going to be generated on the system, this option cannot be considered. (For each XSD uploaded, the system would have to generate a set of class files and this is not acceptable)

XSOM
Using XSOM, i can create a simple tree structure with two user defined types 'ComplexType' and 'SimpleType'

ComplexType
Name
Set of simple types
Set of complex types

SimpleTypes
Name
Value

The problem here is I am not sure how I can go about parsing the XML and having an intermediate representation which could be committed to the CSV file. DOM is not an option at all due to memory constraints.

Example:
XML


CSV


Awaiting your feedback and comments on this. I have been burning my head for the past few days trying to figure out the most efficient and scalable solution to this.

Many thanks,
J
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12769
    
    5
Yow what a problem. Off the top of my head.

It seems to me that the essential thing about a CSV file is that you have to have all of the bits before you can write the line whereas in XML the bits that belong together can be separated by a considerable distance.

This would seem to call for constructing some sort of template that represents a line - one or more of the Java collections would be involved. Using SAX or the new StAX (in java 6) API you would scan the XML input and fill template(s) - when full write to output.

Bill
Johnny Augustus
Greenhorn

Joined: Oct 07, 2006
Posts: 18
What I have in mind is an array of key-value pairs. I could parse through the XSD, identify all the simple types (along with their position relative to the root) and initialize the array keys with the simple type names.
eg:
[(countries_country_id=''),(countries_country_name=''),(countries_country_states_state_id=''),(countries_country_states_state_name='')]

Further on, I could parse the XML (using SAX/StAX), populate the above array one simple type value at a time. I could make use of a stack to maintain the state information by pushing the element names. Once I encounter a type that is already present in the stack, it will identify the end of a line (or record) in the CSV file. The contents of the array will now be written to the CSV file, its contents will be reset (sanity check) and the parsing process will continue).

I was also thinking of replacing the array of key-value pairs with a type that extends HashMap, but I really do not need the flexiblity a Hashmap offers. Also, the ordering of elements might become an issue and I would have to move over to a Treemap (performance hit?). I feel that in the case of a deeply nested XML, hashing will provide a considerable performance improvement. So its going to be a trade off here.

I guess I'll take a couple more days to finalize on the best way out. Meanwhile, if you could provide me with any alternative approaches, it would be really helpful.

Many thanks,
J
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12769
    
    5
I think you are on the right track. To solve the ordering problem you just need an array of field names in the desired order (as per the first line of the CSV file) then you can pull values out of the Map in that order.

I suspect that each new XSD you encounter will require a new refinement.

Let us know how this turns out.

Bill
Johnny Augustus
Greenhorn

Joined: Oct 07, 2006
Posts: 18
Sure, I will keep this post updated. Thanks for the feedback
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: Dynamic translation of XML into CSV using XSD