we have a server application which starts listening on a port for client connections. We are in the process of building fail-over mechanism into our application. We want to run two instances of this application on two machines, one in active mode and the other in passive mode. When the active server goes down for whatever reasons, the passive one picks up and starts servicing clients.
We were able to achieve notifications between the two servers without any problems, We are having a bit of a problem detecting this notification on the clients. If anybody could guide me in the right direction, that will be very helpful.
here's my server thread's run method.
Whenever server recieves a client connection, we add this connection to a List. When the server recieves a de-active notification, we close all these sockets in the list and clear the list.
After the server has been de-activated, it should not accept any connections anymore, forcing the client to try and establish a connection to the other server. Client has this logic, if the attempt to establish a sockect connection is not successful for 3 attempts, it immediately tries to establish a connection to the other server.
This hasn't been working as expected, when I try to establish a cnnection to the passive server, instead of failing I was able to establish a connection.
Is it a better idea for the server to accept the client connection, and write the status, and let client decide to keep the connection or switch based on the status. On the server side, of course we can close the socket after writing the status info if the status is de-active.
Any suggestion is greatly appreciated.
Thanks for your time. Cnu. [ March 24, 2006: Message edited by: cnu sri ]
An active server needs to break out of this loop and close the ServerSocket when it goes out of service, doesn't it? Of course if it seriously crashes and terminates the JVM that's pretty well closed, too.
The standby might not enter this loop until it is called into action. That way nobody can connect to it before it's really ready. It might need to open a different port for inter-server communication. Of course if the active server crashes the VM it will never send the standby the signal to start accepting connections.
I like your three tries choice. Make sure to space those out a little with sleep. We had something much like this in an old system and it was a bit of a challenge to avoid false failover on a minor network burp. How can you tell if your primary server is not merely offline and really most sincerely dead?
This stuff is tricky, no? Have you considered a hardware solution like an ArrowPoint box? You could load balance and utilize both servers instead of having an expensive box sit idle, or do failover as you described.
A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi
Joined: Jan 23, 2004
Thanks very much for your ideas and time.
Originally posted by Stan James: An active server needs to break out of this loop and close the ServerSocket when it goes out of service, doesn't it? Of course if it seriously crashes and terminates the JVM that's pretty well closed, too.
my apologies for not providing clear information on this one. The fail-over between servers is achived using TIBCO's subscription. Both of these servers are configured to be in a group. We get a notification via the callback. This has been working fairly consistent so far.
Yes, I think it sounds a much better choice to kill the server socket thread entirely on de-activation, and start on activation.
For the load balancing, I think we have to do a little more work as we have to come up with a strategy to avoid duplicate message consumption. We are not taking this up as of now, but we sure have to consider in the future as the volume grows.
Yes, we are spacing our connection attempts with a sleep of 500ms.
Regarding, ArrowPoint Box, I haven't thought about it so far. But will surely investigate into the possibility as time permits.
Once again, Thanks a million for the ideas.
Joined: Jan 23, 2004
Originally posted by Stan James: How can you tell if your primary server is not merely offline and really most sincerely dead?
This is one of the problems we had faced in the past, we had a deadlock in the application. Once the application deadlocked, it used do nothing. Since the application hasn't generated any signal the passive server was still running in the passive mode, the active server doing nothing.
Actually, Stan, could you please throw some light on what care should be taken to survive such situations.
It used to be a nightmare to keep watching the logs if the application is really processing anything, if it stopped then manually killing the app forcing the other to pick up. Life is much better after fixing that deadlock.