We recently upgraded our connection pool manager to the version that is provided in Tomcat 6 (DBCP). However, we are plagued with an issue where the connection pool reaches its maxActive limit and then fails to release those connections back to the pool. (We do have removeAbandoned="true" and the removeAbandonedTimeout="30"). Eventually all of our web pages stall, and we have to restart Tomcat. We tried setting a maxWait value, but instead of a loading stall we get a lot of "pool error Timeout" exceptions and broken web pages. This issue also occurred in our previous connection pool before DBCP, so it seems like a problem that occurs no matter which connection pool implementation we use. The problem happens randomly; once or twice every couple of days. We have periods where this isn't an issue for weeks, and then times when we have to restart Tomcat a few times in one day.
I'm at a loss on how to track this down. We have turned on the stacktrace logging and fixed every abandoned connection found, but the issue still remains. I have even adding custom logging that will print a stacktrace whenever the connection pool reaches a new maximum size, but I wasn't able to determine anything from that. I also occasionally dig through the access logs during these episodes, but it feels like trying to find a needle in a haystack; also, all the page hits seem to be to pages that handle their connections properly.
So my questions are:
1) Could this some sort of thread-lock issue?
2) Any recommendations for pinpointing the problem?
3) Should we switch to yet a different connection pool manager (new tomcat-jdbc, c3p0, etc.)?
If you have enabled "removeAbandoned" then it is possible that a connection is reclaimed by the pool because it is considered to be abandoned. This mechanism is triggered when (getNumIdle() < 2) and (getNumActive() > getMaxActive() - 3)
For example maxActive=20 and 18 active connections and 1 idle connection would trigger the "removeAbandoned". But only the active connections that aren't used for more then "removeAbandonedTimeout" seconds are removed, default (300 sec). Traversing a resultset doesn't count as being used.
Please let me know, if you had figured out what the problem was in your case and the solution you implemented, if any.