This week's giveaway is in the Android forum.
We're giving away four copies of Android Security Essentials Live Lessons and have Godfrey Nolan on-line!
See this thread for details.
The moose likes XML and Related Technologies and the fly likes DOM Multithreaded Read - An Alternative Approach Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Android Security Essentials Live Lessons this week in the Android forum!
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "DOM Multithreaded Read - An Alternative Approach" Watch "DOM Multithreaded Read - An Alternative Approach" New topic
Author

DOM Multithreaded Read - An Alternative Approach

Ryan Dowdy
Greenhorn

Joined: Aug 03, 2011
Posts: 8
Hello!

I am writing a Java class that reads a user-created XML file and then performs some actions based on the user's input.

The XML parser I am utilizing (javax.xml.parser) produces a Xerces implementation of the DOM. This is all fine and dandy, except there are many cases where my class makes use of concurrency. Sprinkled in my class are numerous methods involving DOM reads. These methods are frequently called concurrently!

I found out, unfortunately, that the Xerces DOM implementation is not thread-safe even for read operations

As you may have realized, I must now find a way to work around this. My first question is this:

Are there alternative DOM implementations that offer at least thread-safe reads? If so, where can I find these? I have done some searching, but to no avail. I am not entirely sure what exactly I need to search for in the first place. This would be my first choice for solving my problem, as it wouldn't require too much modification to my code.

If there are not alternatives, my other question is this:

I have offered some alternatives below that would fit my predicament, and also some disadvantages to them. My highest choice is the first one, do you think this is a viable approach? If not, why?

  • I need to traverse the entire DOM (before any new threads are created) to ensure the user entered the correct data for various tags anyway. I could enter the tags and their corresponding data into a tree as I traverse the DOM and use that tree in my operations. Disadvantage: This might require a rather extensive restructuring of my program, putting me behind in this project. However, this would give me an excuse to change the architecture which I'm not quite happy with, so it might be worth it.


  • Synchronize every DOM read. Disadvantage: There are a large amount of DOM reads, I imagine this would slow down the concurrency to almost a "single-threaded" pace.




Thoughts?
John Jai
Bartender

Joined: May 31, 2011
Posts: 1776
I am writing a Java class that reads a user-created XML file and then performs some actions based on the user's input.

What is the actions you do? Do you change the DOM Document based on user's input and create a new XML on that?

If your input XML doesn't change in your action, why not just read the XML and populate the information on custom classes and make the read operation thread safe.

As a side note, why you need the read operations to be synchronized?
Ryan Dowdy
Greenhorn

Joined: Aug 03, 2011
Posts: 8
What is the actions you do? Do you change the DOM Document based on user's input and create a new XML on that?


Sorry, I should have been more specific on these actions! The actions performed do not alter the DOM in any way, except they do involve reading from the DOM, which is not a thread-safe operation for this implementation.

If your input XML doesn't change in your action, why not just read the XML and populate the information on custom classes and make the read operation thread safe.


I believe this statement is embodied in my first bullet point, which, as time passes, I am becoming more and more inclined to implement.

As a side note, why you need the read operations to be synchronized?


The XML does not change in my actions, but the Document model of it <b>does.</b> The Xerces implementation of the DOM, in order to improve performance, uses a cache for lookups. My threads use their own copies of Node references in the Document and they <b>do not</b> alter them, but they do perform reads on them. The Nodes share a common Document. As a result, the Document's cache is concurrently modified on read operations. I would rather the read operations are not synchronized, but they are not thread-safe and I can't implement them in any concurrent fashion without synchronizing them. However, to synchronize my read operations would most likely have a significant effect on performance.
John Jai
Bartender

Joined: May 31, 2011
Posts: 1776
Ryan,

You have already investigated a lot. Just to re-iterate that DOM isn't thread safe, I read in this FAQ that programmers should handle thread safety in application code.

Now that the thread is moved to this XML forum, lets wait for some XML gurus to reply what they think.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

Well, personally I find the concept of read-only operations which aren't thread-safe rather offensive from the technical point of view, so that would lead me towards the re-architecting which you're considering.

I realize that isn't really a reason you could defend to a committee which approves system changes, but quite often opinions like this are dolled up to be more committee-friendly and used to get changes made.

Just my two cents.
Ryan Dowdy
Greenhorn

Joined: Aug 03, 2011
Posts: 8
Well, personally I find the concept of read-only operations which aren't thread-safe rather offensive from the technical point of view, so that would lead me towards the re-architecting which you're considering.

I realize that isn't really a reason you could defend to a committee which approves system changes, but quite often opinions like this are dolled up to be more committee-friendly and used to get changes made.


I too was quite shocked when I learned of this. I suppose XML parsers have more of a need to be efficient due to the possibly large size of XML files. Luckily, this is a new project and I am the only one working on it, so there is no need to consult anyone but myself in order to go through with this re-architecting.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

Actually I was looking at the Xerces site for something completely unrelated to this thread today, and I noticed that by default Xerces doesn't even load nodes of the tree from the input document until some code asks for those nodes. So yeah, doing lazy loading like that means that when your code asks for a node, it's possible that a whole lot of changes happen to the internal data structures which Xerces uses to support its DOM implementation. Not surprising that it isn't thread-safe then.

Which means that now I'm not nearly as hostile to the idea. But still if it were me I would re-architect to avoid having to deal with the lack of thread-safety. It's easy to get synchronization wrong, especially if you try to be clever about it. (Here I am producing the committee-friendly reason for re-architecting, even if you don't need it!)
Ryan Dowdy
Greenhorn

Joined: Aug 03, 2011
Posts: 8
by default Xerces doesn't even load nodes of the tree from the input document until some code asks for those nodes.


Do you know if this is because a SAX parser is used? From what I understand, this sort of "lazy loading" improves efficiency significantly. I was told by a coworker that an alternative to a SAX parser is a DOM parser which loads the entire DOM into memory.

Or is the Xerces internal respresentation of a DOM separate from this?
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

I was reading about the Xerces DOM parser. Whether it uses a SAX parser internally or does its own parsing of the document, I don't know. With the optimization I was referring to, the DOM parser actually doesn't load the entire DOM into memory unless and until the application requests all of the nodes. Or the last node in the document, or something like that, I didn't look into the details.
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: DOM Multithreaded Read - An Alternative Approach
 
Similar Threads
CopyOnWriteArrayList
Composite Recursive List
XML parsing in Java
DOM or SAX confusing
Working with html Nodes Re phrased