File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes XML and Related Technologies and the fly likes XML Log perfromance Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of EJB 3 in Action this week in the EJB and other Java EE Technologies forum!
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "XML Log perfromance" Watch "XML Log perfromance" New topic
Author

XML Log perfromance

Dorothy Taylor
Ranch Hand

Joined: Nov 26, 2007
Posts: 104
Hi

I have an xml log file in which I have to make very frequent updates (Inserts, edits, deletes). I will be using Xerces parser to parse this log file. I want to know how performance can be obtained given that I will have to frequently upload the xml in memory, make updates in it, then write it back, again read the DOM in memory and so on
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12675
    
    5
Well - don't have to do it that way, for additions just write to the end of your file in an XML format.

You will NOT have an XML document suitable for directly parsing, but all you need to do is add the ROOT element start and end tags and bingo - ready to parse.



...


Works like a charm, use it all the time.

With SAX parsing the log file can be huge but your memory use is small. For inserts and deletions write a new file from the SAX events.

Bill

Java Resources at www.wbrogden.com
Dorothy Taylor
Ranch Hand

Joined: Nov 26, 2007
Posts: 104
Further can we make multiple parallel threads to write to this xml file
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18115
    
    8

Sure, just synchronize the method which writes to the XML so that only one thread at a time can be writing.
Dorothy Taylor
Ranch Hand

Joined: Nov 26, 2007
Posts: 104
William Brogden wrote:
With SAX parsing the log file can be huge but your memory use is small. For inserts and deletions write a new file from the SAX events.


I did not quite get how you are advising about insertions and deletions. I mean is serializing the DOM as xml a better approach or re-creating the same file again and again better? Moreover, I had read somewhere that SAX can be used only for read operations, but for editing things we can use DOM only.
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12675
    
    5
If reading and writing the DOM works for you without running into performance problems, by all means stick with it.

The point about SAX is that the SAX parser generates an event for every data item in the XML Infoset.

Therefore, if your code handles every event, you can write to a new file exactly all the data being parsed from the XML input document.

Furthermore, with some non-trivial programming - you can detect elements to be removed and locations to add new elements.

Bill
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18115
    
    8

Dorothy Taylor wrote:I have an xml log file in which I have to make very frequent updates (Inserts, edits, deletes).


I think I may be misunderstanding this statement. I'm certainly confused.

First you say you have a log file. To me a log file is a file where you write out some kind of a record of what happened in some system. For example you might have a record which represents Sara signing in or Costco's order being processed.

And then you say you have to make frequent updates. Now, since the concept of a log file doesn't involve changing anything once it is written, I read that as meaning you had some system which was updating some other file, and you were logging those updates in your log file. And I think maybe Bill read it that way initially as well.

But it looks like I read that wrong. You actually have something which you call a "log file" and yet you are going back and changing records which were written earlier into that log file. Is that correct?
Dorothy Taylor
Ranch Hand

Joined: Nov 26, 2007
Posts: 104
Yes, the word log may not be accurate here. basically I have an input XML file that contains some tasks. I need to execute those tasks sequentially or in parallel. As these tasks get executed, I have to record the statuses of these tasks in a new xml file(which I create if does not exist already). I am referring to this output xml as the 'log' here. So for e.g., if there is a in the input xml, then I write in the log file. This would mean that the control reached task 'a' and assigned a thread to it. Now if 'a' is complete, I will change the status to 'end'. So an operational guy, by looking at the xml can exactly know what is happening as the input xml is getting processed. So the whole thing is that what should I do to create the output xml
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18115
    
    8

Well, since you chose to make the log file an XML document, then it follows that every time you want to change the log file in any way, you have to read the whole thing in and write the whole, modified, thing out.

And it seems to me that you already know how to do this. So I'm not sure what your question is now.
Dorothy Taylor
Ranch Hand

Joined: Nov 26, 2007
Posts: 104

ya so that was what I initially thought ought to be done which William invalidated and suggested using SAX parser for writing xml file and DOM for parsing so that performance is not impacted
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18115
    
    8

Seems to me that if performance was going to be an issue, you wouldn't have chosen XML for this task in the first place. And anyway the majority of the processing time is going to be in reading data from the file and writing it to the new file. What you do while the XML is in your code is going to be a very small part of the processing, so optimizing that is going to have very little effect.
Dorothy Taylor
Ranch Hand

Joined: Nov 26, 2007
Posts: 104
Yes, but for every task that is performed, I need to visit the output xml twice to update the status and then write it to file system. So again the question is what is the best way to do this. Is it possible to have the DOM in memory just once or do we have to do that again and again for every task
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18115
    
    8

That's a good point. You could load the document into a DOM once, when the application starts, and repeatedly modify the DOM and write it out. That would at least save the step of reading it in each time.

But if it were me, I would really have chosen some other format which didn't force me into discussions like this. A database, perhaps. Or monitoring via JMX. I'm sure there are many other possibilities.
Dorothy Taylor
Ranch Hand

Joined: Nov 26, 2007
Posts: 104
Why I cannot choose another format is because (i) I want a format that is human readable (ii) my xml will not be too big in size. It would be something like 5-10KB max
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18115
    
    8

Human-readable? Well, it's debatable whether XML fits into that category. But you seem to have limited your possibilities to only allowing the humans to look at your log file directly.
Dorothy Taylor
Ranch Hand

Joined: Nov 26, 2007
Posts: 104
yeah, so I guess this design is good that we have a DOM once in memory and we keep writing updates as they come, into the file system. But I now have a doubt that whether this DOM (in memory) can be made synchronized, so that parallel threads that are attempting to modify it do not conflict. Similarly the method that writes DOM tree to file system will be synchornized. But how can we make the in-mmeory DOM synchronized? Is it possible to do that?
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12675
    
    5
But how can we make the in-mmeory DOM synchronized? Is it possible to do that?


Sure, why not? No different from any other data structure that needs to be protected against simultaneous access.

Bill
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: XML Log perfromance
 
Similar Threads
null retrieved from env-entry resource
looking for Multithreaded patern
validate an element for unique value in multiple occuranes
How do you see the xml being sent when you invoke a service on a JAX-WS generated client?
Johannes' assigment log edititing????