The NodList API (and the org.w3c.dom package API) don't say anything about thread safety, so in the absence of saying they are thread safe, you should assume they are not. That means you should properly synchronize access to the Document, the Nodes and the NodeList so no two threads can access the data at the same time. If you need simultaneous access I would suggest giving each thread a different instance of the document, nod, and NodeList so they don't interfere with each other.
rohit chavan wrote:The document is being parsed only once per thread. (which we can call as set() operation)
Unfortunately, you can't assume the state of the document is not being modified by each thread. It is not safe, I don't think, to read the document once, and share it with multiple threads, as it is the root of all searches, and therefore the root of all NodeLists. You need to put this 'set' operation into a synchronized block that would synchronize with all the NodeLists. In fact, I would probably use the Document as the object which I synchronize on, regardless of the number of NodeLists that exist, since that represents the data which needs to be protected. And every chunk of code which accesses the Document, a Node, or the NodeList will synchronize on the Document.
I am further checking if this has anything to do with the heap-space settings, though I am not getting any out-of-memory error, I wonder if this can be the reason.
I doubt it. What is most likely happening is that multiple different threads are sharing the same reference (probably in the Document instance) to the 'current' or 'root' node. Traversing a list probably modifies this value, and when some other thread tries to access the reference while a different thread is in the process of modifying it. I would expect that even when you don't get the null pointer exception you probably get several cases where the value you do get is not correct (but it is probably harder to detect because of no exception). If you want to use the same document and its data in multiple threads I think you should create a very robust testing scheme where you can detect when incorrect values are retrieved.
William Brogden wrote:I would certainly be using the getLength() method of NodeList - when it indicates an empty list you can investigate. Much preferable to looking at an NPE.
He actually is doing that, but there is an inconsistency, he gets a length > 0, but still gets an NPE, and if he tries a second time, doesn't get the NPE. This inconsistency is what makes me believe he is still not synchronizing all the access correctly, or he would be better off with copies of the document, rather than multiple views of a single instance.
Steve Luke wrote:
This inconsistency is what makes me believe he is still not synchronizing all the access correctly, or he would be better off with copies of the document, rather than multiple views of a single instance.
Making the access (read) synchronous is taking a hit on performance, which is obvious.
So I am trying to create copies of the document object. I think using commons pool should be my obvious choice.
Or am I misreading my requirement here, and just making a certain copies of the currently singleton document object should be my first concern.
I would be surely doing the testing for both the approaches.
Meanwhile, please help me in case there can be a better approach.