I'm trying to create a communications manager module for my application which is responsible for the sending and receiving of messages between nodes in the network using TCP and persistent sockets (in-order confirmed delivery is a requirement). Each node opens are ServerSocket immediately to receive messages. Sockets are created the first time a message is sent to a certain destination and the PrintWriter is stored in a HashMap to be retrieved the next time a message is sent to the same node.
The problem occurring frequently is that while one node may send and receive messages as it should, the other one may miss out on some messages or not receive any at all (though it can still send messages which are properly received by the first node). No exceptions are shown to indicate any problems.
The code for sending messages is as follows (ignore that the loops or sockets are not terminated):
I've tried a simpler client/server case where the client sends messages in two threads and the server receives them properly. However, when I make this go both ways and add another thread, the problem reemerges. If anyone would like to try the code, I've uploaded both cases at this link: http://www.megaupload.com/?d=K4R58YRQ. Testmulti is the simple one that works, while Multest is the advanced one that causes problems.
Since all nodes are receivers as well, they run the RcvMsg() function right at the beginning of the application.
Also, because I'm using persistent sockets for in-order delivery of packets, every time a connection is made, a new thread is launched (RcvHandler) and readLine is run on that connection in a while loop. There should be at most 8 or so connecting nodes.
Upon receiving a message, the receiver adds it to a MsgBuffer (basically a ConcurrentLinkedQueue) for processing sequentially (in another thread). In a simple test case, I'm just making it display the message using System.out.println, which is where I see that nothing is printed at times on some nodes.
I'm not sure if the problem is on the sender's side or the receiver's.
I read somewhere that using RMI is also a possibility, but I'm not too well informed about it. Could it be used in a case such as mine with a relatively high sending rate?
There are a couple of red-flag issues here. I won't say that this is a definite diagnosis of your problem, as I haven't seen all the code, but the issues are:
1) Using readline/println on sockets is dangerous because the definition of "end-of-line" is different on different platforms. If you send from a UNIX platform, and read from a Windows one, the Windows one wants something more for end-of-line than the UNIX one will send (i.e., \r\n vs. \n) and so readline() may not return. It's better to send explicit end-of-line sequences because then you know exactly what's going on the wire.
2) Using BufferedReader like this is a no-no because of the buffering. A client can get hung up waiting for more data to arrive over the network, while the data it actually wants is already hiding in the BufferedReader's buffer. If you add a second argument to the constructor -- the integer "1" -- it effectively turns off buffering and makes this problem go away. This advice is very handy when using BufferedReader at the command line, too. Of course, see 1) -- you really don't want to use readline(), anyway.
I've now tried adding a second argument of "1" to the BufferedReader constructor and adding a "\n" to the string before using print to send it. However, the behavior hasn't changed much. The output for the two nodes is the following:
At the end, both nodes do not have an equal number of received Hello messages, though this time, all 500 numbered packets were received correctly. Usually, it is the Hello messages that are missed and not the numbered ones.
As I mentioned previously, when I use two threads to send similar numbered packets, there is never any problem. It's only when I replace one of those threads with this other thread that sends Hello messages every 20 seconds that I start missing messages.
What should happen is (besides the 500 messages being received properly), every 20 seconds both nodes should send Hello messages to each other which will be received to ensure that the nodes are still alive and running the application. In the application run right now, some messages are missed, so the code perceives the node as having failed.
Also, after changing the println and BufferedReader portion of the code, it seems that some of the messages arrive with a huge delay which hinders the running of the application.
I've simplified the problem so that it consists of two nodes, running two sending threads each and with a ServerSocket open for receiving messages. This also seems to cause messages to go missing without any exceptions. Both sides indicate that they have sent 1000 messages, but do not receive the same. If only one node is the sender and one is the receiver, there is no problem. Also, if only one thread is sending, the application runs problem-free. It is only when both sides act as senders and receivers with multiple threads using the same persistent socket that the problem arises.
Could this be a problem in the way I've implemented sockets? Thanks!