wood burning stoves 2.0*
The moose likes XML and Related Technologies and the fly likes Slicing XML document Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "Slicing XML document " Watch "Slicing XML document " New topic
Author

Slicing XML document

Tom Stevns
Ranch Hand

Joined: Nov 20, 2001
Posts: 120
Hello !

I have a XML document containing 200.000 elements.
Those elements should be placed respectively into 200.000 new XML files.

I would appreciate to hear from You about this problem

NB! No validation is needed.

The input should be streamed

- So far my own suggestion concerns about SAX2 - DOM2 - XPATH - XSLT or even just using som of the Java STRING metods.

Thanks in advance


Regards Tom Stevns, SCJP2
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18909
    
    8

You are just looking for suggestions, correct? Then I would suggest using SAX for the input.
John Simpson
Greenhorn

Joined: Sep 10, 2007
Posts: 25
JaxB is what I am using, it's fairly straightforward... just a suggestion.
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12825
    
    5
Sounds like basic SAX processing to me with a whole #@load of FileWriter creations.

Where in the world are these 200,000 files going? I would worry that having that many files in one directory would surely strain any operating system. If the files are just an intermediate step and get consumed by another process there may be a lot easier way to get the job done.

Bill
Rahul Bhattacharjee
Ranch Hand

Joined: Nov 29, 2005
Posts: 2308
Originally posted by Paul Clapham:
You are just looking for suggestions, correct? Then I would suggest using SAX for the input.


+1


Rahul Bhattacharjee
LinkedIn - Blog
Tom Stevns
Ranch Hand

Joined: Nov 20, 2001
Posts: 120
Hello

Thank You for the suggestions.

Regarding: Where in the world are these 200,000 files going?

William Brogden - These files are putted to a MQ queue i slices because
our MQ-system i not allowed having messages in the size about half a Giga-byte

I have done it with SAX and "Streaming a large HTTP file" about four years ago, but this time i has to be in a more "generic"(I hate that word) fashion

Have a nice day or evening to all of You
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12825
    
    5
William Brogden - These files are putted to a MQ queue i slices because
our MQ-system i not allowed having messages in the size about half a Giga-byte


So you are not really writing files to disk but creating messages and adding them to a MQ server. Therefore everything depends on what your MQ messages have to be constructed from.

The simplest (assuming you want to use XML parsing) would be from a String so for each element that gets turned into a separate document you would create a StringWriter when the startElement event gets called, then write to it as the events for the contained tags occur, and finally create the MQ message when the endElement event occurs.

Even simpler would be to read the file as text and locate the start and end elements by literally doing String operations line by line but then you lose the parser error checking.

In any case I bet the limiting factor will be the speed at which the MQ server accepts messages.

Bill
Tom Stevns
Ranch Hand

Joined: Nov 20, 2001
Posts: 120
Hello William !

--------------------------------------------------------------------------------
In any case I bet the limiting factor will be the speed at which the MQ server accepts messages.
--------------------------------------------------------------------------------

Tomorrow I will give You an answer because I've already created and Integration-tested that part. I just have to make a test file which generate and write 2*10E5 dummy-messages to an input queue.

About the implementation: The only thing I can be sure about is that the
XML is in a proper format. Therefore it is too risky to let it be line dependent.

I concider a XMLreader and XMLwriter combined with an iterator will do
the job.

Anyway I just have to try - even the ultimate code with less than 15 line
would be nice. There must an XML API having a String like method that simply grabs the whole content of an XML-element including the child elements. ;)
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12825
    
    5
There must an XML API having a String like method that simply grabs the whole content of an XML-element including the child elements. ;)


You might find a "pipeline" style processing toolkit that could do the job, see my summary article on pipeline toolkits - the ServingXML toolkit looks like your best bet.

Bill
Tom Stevns
Ranch Hand

Joined: Nov 20, 2001
Posts: 120
Thank You very much William !

I look forward to read about it tomorrow.

Sweat Dreams |o)
 
 
subject: Slicing XML document