I've been banging my head against this one for a while. We have a system that uses a socket protocol to communicate between our website and the back end. Most of the time, it works fine. But, one in every 20 to 30 calls to the server gets dropped. The protocol has an intermediary that tests for availability of the server, then, the actual request is sent to the server. We are seeing that when the call is dropped, that the first call by the intermediary hits the server, and, the second call to the server isn't getting to the requestHandler. In the log, we see that one of the calls from the web application just doesn't appear in the handler log. It doesn't send an exception, it looks like it just disappears. Then, the next time a call is made to the server, the server code works fine. The code goes something like this with a bunch of logging and exception handling in between...
What am I missing? What should I do to get this to be rock solid reliable?
synergy - where the community is greater than the sum of the parts
Find whoever wrote that and smack 'em . The code is binding and unbinding from the server socket with each iteration through the while loop so there's a period of time where there is nobody listening for connections. It should look something like this:
BTW, what is this.serverBacklog set to? That parameter determines how many connections are allowed to be waiting to be accepted. If more than that number of clients attempt to connect simultaneously, some will fail. [ April 26, 2005: Message edited by: Joe Ess ]
I've moved the socket create outside the while loop (removed the close and socket = null) and I'm still seeing the socket drop calls.
the backlog is intentionally set to 1 - there are multiple threads that all have this code running on different ports. We use this technique to manage availability of the machine for calls that take varying amounts of time. the twin hits to the port allow us to then route to another port if that thread is busy. This gives us some flexibility in managing multiple servers and multiple back ends...
the backlog is intentionally set to 1 - there are multiple threads that all have this code running on different ports.
My gut reaction is "smack 'em again". You have multiple processes bound to different ports, the intermediary process determines which one is free and dispatches a request to it? The more conventional way to use TCP/IP is to have a single server process bound to a single port which creates a thread to handle each incoming request. When a client connection is accepted, it is transferred to another open port by the underlying network code and the server port is free to handle another request. We see this process as the accept() call that returns a new Socket instance. Whoever wrote this code apparently is trying to solve a problem that Java networking doesn't have (i.e. the ability to handle multiple simultaneous requests). Have you looked at the code for the intermediary?
Originally posted by Joe Ess: a single server process bound to a single port which creates a thread to handle each incoming request
I agree with Joe, but I'll add that another way which gives you more flexibility in tuning is to hand off the newly accepted Socket to a work queue processed by a thread pool. For one, creating threads is somewhat time-consuming (not nearly as much as processes, but still noticable), so there's no reason to keep killing and recreating them.
More importantly, the use of a queue would allow you to basically do what (I assume) the request handler is doing now: prioritizing requests and optionally refusing certain ones if overloaded. If all it's doing is choosing a free port then you can eliminate it entirely and use a simple FIFO queue.
JDK 1.5's java.util.concurrent has all the classes you need, and if you're pre-1.5 that package comes from Doug Lea's excellent Concurrent library. PooledExecutor is the thread queue, and Channel is the queue interface. I used that package with minor alterations as the core of a filtering HTTP/S proxy five years ago.
Of course, for maximum throughput you should consider migrating to NIO, but that's a lot more work as the paradigm is entirely different.
[ typos ] [ April 26, 2005: Message edited by: David Harkness ]