Granny's Programming Pearls
"inside of every large program is a small program struggling to get out"
JavaRanch.com/granny.jsp
The moose likes BEA/Weblogic and the fly likes Failover in a clustered environment fails. Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Android Security Essentials Live Lessons this week in the Android forum!
JavaRanch » Java Forums » Products » BEA/Weblogic
Bookmark "Failover in a clustered environment fails." Watch "Failover in a clustered environment fails." New topic
Author

Failover in a clustered environment fails.

Shekar Atmakur
Ranch Hand

Joined: Oct 24, 2003
Posts: 36
Hi all,
I would appreciate any of your thoughts on this and suggestion to fix this issue.
Breifly, we have a problem in a clustred environment where request are not being transffered to the second managed server if one goes down.

The issue is with a STATEFUL session bean which is defined as clusterable in the Deployment descriptor. WE cache the reference to this bean using the getHandle method and re-use upon future requests. The servers are deployed on seperate physical machines running weblogic8.1 SP4

The below log snipets are from the environment where we specifically did a test on fail over.Initially requests were being sent to managed server1. In the below line form the Managed server 1, please note the cached reference to a session bean.

2006.08.08 17.21.55.822 ALL [ExecuteThread: '11' for queue: 'weblogic.kernel.Default']: ServiceController: processInformation(map): ******* { hhEveningTelephone_Id=225160, hhEarnedPast=, yearhhDueDate=yyyy,
perf1=ejb/SessionManager:t3://xxx.xxx.xxx.16:8080,xxx.xxx.xxx.15:8080#21125316336418816,
HTTP_REQUEST_METHOD=POST, hhCoveredInsurance=, hhNumberExpected=, hhDueDate=, APP_NUM=359589}


Managed server1 was brought down to test failover. The below log from Managed server2 console logs indicate the same.

####<Aug 8, 2006 5:22:12 PM EDT> <Info> <Cluster> <vgstage2.hhs.state.ma.us> <IntakeManagedServer_2> <ExecuteThread: '4' for queue: 'weblogic.kernel.System'> <<WLS Kernel>> <> <BEA-000113> <Removing IntakeManagedServer_1 jvmid:3948319519780706293S:xxx.xxx.xxx.16:[8080,8080,8443,8443,8080,8443,-1,0,0]:xxx.xxx.xxx.15:8080,xxx.xxx.xxx.16:8080:AppSvcSP4Domain:IntakeManagedServer_1 from cluster view due to PeerGone.>

Now when the next request comes to Managed server2, i see that the reference object to the sesison bean is trying to connect to Managed server1 and failing as shown below, when it should actually connect to managed server2.
If we saw a naming exception or something of that sort, it would have indicated that there is an issue in clustering and possibly due to application code.
But since, the reference Object has information about both the managed servers, and is unable to connect to either, i seem to think it has more to do with BEA behaving improperly.

2006.08.08 17.22.29.589 ALL [ExecuteThread: '12' for queue: 'weblogic.kernel.Default']: eRecord error Description is: java.lang.Exception: java.rmi.ConnectException: Could not establish a connection with 3948319519780706293S:xxx.xxx.xxx.15:[8080,8080,8443,8443,8080,8443,-1,0,0]:xxx.xxx.xxx.15:8080,xxx.xxx.xxx.16:8080:AppSvcSP4Domain:IntakeManagedServer_1,
java.rmi.ConnectException: Destination unreachable; nested exception is: java.net.ConnectException: Connection refused; No available router to destination



Thanks,
Shekar

[ August 30, 2006: Message edited by: Shekar Atmakur ]
[ August 30, 2006: Message edited by: Shekar Atmakur ]
Cameron Wallace McKenzie
author and cow tipper
Saloon Keeper

Joined: Aug 26, 2006
Posts: 4968
    
    1

There's alot of information there.

I'm going to make a very bold assumption - the problem is with the code, not the application server. If that's a valid assumption, the question is "where?"

One thing I have seen is incorrect use of JNDI. EJBs get clustered, but applications use either topology dependent names, or perform local lookups. If a JNDI lookup goes to a certain server, if that server fails, the ejb won't be workload managed, even if another server is waiting to take the request.

This is TOTALLY a shot in the dark. If I'm way off, just dismiss it.

There's a good tutorial on JNDI naming on my website, and it talks about a federated namespace, and the danger of local lookups. It might help:

http://www.technicalfacilitation.com/get.php?link=naming
Lin Feng
Ranch Hand

Joined: Dec 11, 2002
Posts: 142
I do not think it is a JNDI issue.

Weblogic supports cluster on both home and bean level. Usually JNDI only have trouble with the home level except the bean interface is put on JNDI directory ...


From the log we can see at least the bean stub knows bean is a clusterable bean. It may log an exception when it tries to send a new request because RMI connection is closed quickly. After it find the primary server is not available , it should failover to the second one.

From description what I read is the failover did not happen ...

Lin

[ August 30, 2006: Message edited by: Lin Feng ]
[ August 30, 2006: Message edited by: Lin Feng ]
Mahesh Bhatt
Ranch Hand

Joined: Sep 15, 2004
Posts: 88
now whatever the reason may be, we first need to see if the second server is ready to take the new requests or not. I mean to say that we need to check if this server is accesible from the machine where the client is running.
Are you able to ping this server using the Weblogic.Admin command ?
If you are able to ping this server, then please check the code, does it have the correct information about the URL of the second server?


Impossible is I M Possible
Shekar Atmakur
Ranch Hand

Joined: Oct 24, 2003
Posts: 36
Thanks guys,
Your thoughts are appreciated.
In response to your questions,
A>Both servers are responding properly to a multicast test suggested by BEA.
B>The remote reference seem to have the correct information about both the managed servers.

Also, I found an interesting article in the following URL and think it may have some thing to do with the issue i have.
Clustering Best Practices

Please go to the End of the page and read under "Firewall Considerations"
Please let me know your thoughts


thanks,
shekar
[ September 05, 2006: Message edited by: Shekar Atmakur ]
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Failover in a clustered environment fails.
 
Similar Threads
ConnectException in Clustering
webservice - spring -jws
Spring iBatis NullPointerException when accesing SQLMAP implemented class
Struck thread and deadlock while getting connection
Value not getting inserted in EJB 3 with BEA