In our production cluster enviornment we use WebLogic v923 and Java v5.
We have many queues that get high volume of messages per day.
The JMS queues are setup with redirect limit as zero and the messages are sent to error queue when they fails to be processed by these queues
s related MDB due to an application logic related exception.
All the queues including the error queues are having persistent stores which is a database table per queue.
Weblogic oracle non-xa jdbc driver is used.
This mechanism is working fine.
But now and then a strange situation is observed. On different days, 5% -10 % of the messages which failed due to an application logic exception are lost.
They are not in the persistent store table and there are no signs of the message being processed by the Message driven bean associated with the error queue.
Unfortunately the weblogic .out files were purged related to dates on which there failures happened due to disk space issues and thus unable to find the root cause.
The jms logging was not set either.
It is confirmed that those messages were processed once through the normal flow from the regular queues but failed due to an application logic related exception. It is confirmed because I see those exceptions that caused failure logged to a differnt mechanism that we have in place.
I went through weblogic jms docs and looked at different settings like expiration value of the message etc in our cluster. The current setting that we have is that the message never expires.
I also confirmed that there were no database crashes around those dates where we observed these failures.
Also we have logic in place to put the message back on error queue when there is an application exception/java.lang.Exception while processing the message from the error queue.
What could be the cause for these occassional failures? The possible reasons I could think of were as follows (but I believe that the message must have been redelivered and processed from the point of failure/recovery by weblogic, which seem not to be the case):
1) When the message is not processed due to the transaction rollback in regular queues related MDB which is due to an application exception that was thrown, something must have happened while persisting that message in the error queue related database table.
Could it be a java.lang.Error like stackoverflow/outofmemoryerror?
2) Something happened after the message is persisted in error queue table but BEFORE the error queue MDB consumed it (which is highly unlikely).
3) Something happened while the message is being processed from the error queue like java.lang.Error(stackoverflow/outofmemoryerror) since we do not have any redirect mechanism for the error queue...thus the failed message must have been lost??
Any input to resolve/avoid this weird issue is highly appreciated.