posted 22 years ago
Hi,
We are using WebSphere Application Server v 4.0.2 on AIX. We are using a cloned environment (both horizontal as well as vertical), with 4 application servers on 2 WebSphere instances. Our database is on a separate AIX box. This box, is however, shared by many other oracle applications (non Web).
We have been having a problem with the Oracle Datasource since our application was launched some time back.
All of the database connections seem to get stuck in a 'Hung' state. The Resource analyzer tells us that connections that were allocated to clients are not getting returned. The strange part is that the problem does not appear at all times, or at regular intervals. It happens all of a sudden. Sometimes the application works well for a week, and then suddenly the problem appears. Restarting the application server solves the problem.
Following is a list of some typical behaviours that we have observed -
1. This problem develops within 5 to 10 minutes (can be less). I.e. the system would be running well at one point of time, and then suddenly, within 5 to 10 minutes, it would die (or the number of database connections start increasing dramatically).
2. Connections are allocated to clients by WebSphere. But these connections seem to get locked up. They are never returned back to the connection pool. Hence the rise in the number of connections in the pool.
3. This cannot be attributed to the code because - The same code runs correctly at all other times. Problem happens only once or twice a week.
4. Generally, it results in the total number of connections in the pool to max out.
5. Eventually, all the processes in the HTTP Server get stuck in the 'Wait' state.
6. Restart of the App Server generally destroys all the connections. Hence the problem gets resolved. This is because the problem generally happens in one instance of time. After the App Server is restarted, it starts behaving normally (on most occasions).
Initially, we felt that the memory of the machine hosting the database was the problem. The memory was then increased, and the problem did not appear again for some time. However, after one week, the problem re-appeared.
Can anyone help?
thanks,
Madhu