This week's book giveaway is in the OCAJP 8 forum. We're giving away four copies of OCA Java SE 8 Programmer I Study Guide and have Edward Finegan & Robert Liguori on-line! See this thread for details.
I have a Tomcat webapp, which is only one HttpServlet. The Servlet uses a Java concurrent Executor to process the POST request payload. Usually each request takes a very short time, because the Executor creates another thread to process the payload, the original thread return response OK immediately. But sometimes Tomcat would hang up to 30 seconds. I don't think the request is blocked in the application code. And when Tomcat hangs, no request is processed but no error or exception either. And I don't think it's GC, because GC collection time doesn't match with hanging. Any help of why Tomcat hangs? The throughput is about 200 requests per second.
So I am thinking to timeout Tomcat request. Any suggestion of how to do it generally? I am thinking to have another Timer thread to record the time and interrupt the blocked thread. How to extend Tomcat for this? Thank you very much.
The Servlet uses a Java concurrent Executor to process the POST request payload.
What have you monitored so far? What are you logging to help figure this out?
Joined: Apr 25, 2006
I agree with repliers that I should find the reason why Tomcat hangs.
What I have done for this investigation are:
1. Logging: the request doesn't hang inside of the application. In other words, the request hangs before entering doPost()
2. I use YourKit to profile Tomcat. When Tomcat hangs, CPU and memory looks OK. And it doesn't map with when GC happens. I modified to use ParallelGC and decrease maxPauseTime. It doesn't help.
3. I increased Tomcat thread pool number and it didn't help.
I am using a Apache common HttpClient to simulate 200 request/second. Also I used JMeter to send HttpRequest. The problem happens to both clients and JMeter is worse. Please note that the client is from one machine. Now my major suspects are:
1. Tomcat. Something inside of Tomcat I could configure or optimize.
2. Client. Maybe the clients open too many connections at one time? I checked the client machine's file description. It's 65355.