Ours is a mammoth J2EE N-tier enterprise application built on WebSphere 3.5.6 platform (Solaris OS) with multiple clones running on different physical machines (horizontally & Vertically scaled). Its a very transaction intensive application with highly demanding avalability & performance requirements.
At times, we are seeing a clone getting Auto-recycled (dies & comes back in a minute on its own). Under what circumstances do we see this recycling behaviour of a clone?
From our priliminary analysis, we could not see any reasons for this to happen. Traffic to the application was below normal level, no Db contention, no OSE buildup...etc.,
Please let us know if any one else ha experienced the same.
Look around for possibly a java core dump. In the case of a crash, the JVM may have core dumped in the working directory or somewhere in the $WAS_HOME tree. Don't remember much about v3.5.6 but I think the nanny process was part of it already and it may be restarting it.
Since it happens only on one clone, it may be worth turning on tracing even in production and work from there. With such a "mammoth" enterprise, I'm surprised your company is still running on v3.5.6. This situation makes it difficult to obtain official support from IBM.
In any case, it might also be worth backing up the configuration, removing, and re-creating the clone. With the configuration repository in a database on v3.5, there were ocassionally corruptions which destabalized the system. Those are tough to track down, hence remove/recreate.