Hi all, I would appreciate any of your thoughts on this and suggestion to fix this issue. Breifly, we have a problem in a clustred environment where request are not being transffered to the second managed server if one goes down.
The issue is with a STATEFUL session bean which is defined as clusterable in the Deployment descriptor. WE cache the reference to this bean using the getHandle method and re-use upon future requests. The servers are deployed on seperate physical machines running weblogic8.1 SP4
The below log snipets are from the environment where we specifically did a test on fail over.Initially requests were being sent to managed server1. In the below line form the Managed server 1, please note the cached reference to a session bean.
Managed server1 was brought down to test failover. The below log from Managed server2 console logs indicate the same.
####<Aug 8, 2006 5:22:12 PM EDT> <Info> <Cluster> <vgstage2.hhs.state.ma.us> <IntakeManagedServer_2> <ExecuteThread: '4' for queue: 'weblogic.kernel.System'> <<WLS Kernel>> <> <BEA-000113> <Removing IntakeManagedServer_1 jvmid:3948319519780706293S:xxx.xxx.xxx.16:[8080,8080,8443,8443,8080,8443,-1,0,0]:xxx.xxx.xxx.15:8080,xxx.xxx.xxx.16:8080:AppSvcSP4Domain:IntakeManagedServer_1 from cluster view due to PeerGone.>
Now when the next request comes to Managed server2, i see that the reference object to the sesison bean is trying to connect to Managed server1 and failing as shown below, when it should actually connect to managed server2. If we saw a naming exception or something of that sort, it would have indicated that there is an issue in clustering and possibly due to application code. But since, the reference Object has information about both the managed servers, and is unable to connect to either, i seem to think it has more to do with BEA behaving improperly.
2006.08.08 126.96.36.1999 ALL [ExecuteThread: '12' for queue: 'weblogic.kernel.Default']: eRecord error Description is: java.lang.Exception: java.rmi.ConnectException: Could not establish a connection with 3948319519780706293S:xxx.xxx.xxx.15:[8080,8080,8443,8443,8080,8443,-1,0,0]:xxx.xxx.xxx.15:8080,xxx.xxx.xxx.16:8080:AppSvcSP4Domain:IntakeManagedServer_1, java.rmi.ConnectException: Destination unreachable; nested exception is: java.net.ConnectException: Connection refused; No available router to destination
[ August 30, 2006: Message edited by: Shekar Atmakur ] [ August 30, 2006: Message edited by: Shekar Atmakur ]
I'm going to make a very bold assumption - the problem is with the code, not the application server. If that's a valid assumption, the question is "where?"
One thing I have seen is incorrect use of JNDI. EJBs get clustered, but applications use either topology dependent names, or perform local lookups. If a JNDI lookup goes to a certain server, if that server fails, the ejb won't be workload managed, even if another server is waiting to take the request.
This is TOTALLY a shot in the dark. If I'm way off, just dismiss it.
There's a good tutorial on JNDI naming on my website, and it talks about a federated namespace, and the danger of local lookups. It might help:
Weblogic supports cluster on both home and bean level. Usually JNDI only have trouble with the home level except the bean interface is put on JNDI directory ...
From the log we can see at least the bean stub knows bean is a clusterable bean. It may log an exception when it tries to send a new request because RMI connection is closed quickly. After it find the primary server is not available , it should failover to the second one.
From description what I read is the failover did not happen ...
[ August 30, 2006: Message edited by: Lin Feng ] [ August 30, 2006: Message edited by: Lin Feng ]
now whatever the reason may be, we first need to see if the second server is ready to take the new requests or not. I mean to say that we need to check if this server is accesible from the machine where the client is running. Are you able to ping this server using the Weblogic.Admin command ? If you are able to ping this server, then please check the code, does it have the correct information about the URL of the second server?
Impossible is I M Possible
Joined: Oct 24, 2003
Thanks guys, Your thoughts are appreciated. In response to your questions, A>Both servers are responding properly to a multicast test suggested by BEA. B>The remote reference seem to have the correct information about both the managed servers.
Also, I found an interesting article in the following URL and think it may have some thing to do with the issue i have. Clustering Best Practices
Please go to the End of the page and read under "Firewall Considerations" Please let me know your thoughts