I have finished with the network part of the assignment (but still to be documented and properly packaged), and willing to move to the last part of it (finally ! ) : the GUIs. It's rather difficult to find people here to talk with about a sockets solution (just try a search with the keyword "sockets" and you'll get a little 43 posts in 2003, most of them talking about RMI) Andrew wrote that one of the reasons for which he chose RMI is that he had a previous experience with sockets and that digging into RMI was a good learning experience. As I had no experience in RMI neither in sockets, I chose the latter because I felt it more challenging for me. On one hand, as a newbie in sockets (and network) programming, I'd like to get my sockets implementation validated (or criticized). BTW, I have no "real" network access, so all my tests have been performed in loopback, which is not reassuring. On the other hand, it may be interesting for people here (now and in the future), who hesitate between RMI and sockets, to see what a sockets implemetation is (or may be), and if they choose the latter, what are the issues they have to face up to and some ways to solve them. Let me know if you are interested, Best, Phil. [ September 08, 2003: Message edited by: Philippe Maquet ] [ September 08, 2003: Message edited by: Philippe Maquet ]
Hi Philippe, You know me - I love to discuss all topics. I agree that this discussion will, no doubt, be very beneficial to others. Hopefully others will add their comments asking for you to start us off. Regards, Andrew
Yes, I know the amazing guy you are ... and knowing you, I didn't wait for anything else than some encouragement to pursue ! Just give me the time to write it, and anyway, it's nearly the time for you to go and sleep...
I agree that this discussion will, no doubt, be very beneficial to others.
For others maybe (and I hope so), for me for sure ! (I feel quite lonely in my sockets adventures ).
Hopefully others will add their comments asking for you to start us off.
I won't wait ! See you soon, Best, Phil. [ September 08, 2003: Message edited by: Philippe Maquet ]
Hi guys, I certainly find this topic interesting as well. I have read a lot of responses regarding the decision between sockets and RMI was more about an interest in particular topic (socket or RMI), and the major argument about RMI is completely isolating the communication layer between client and the server from application logic. Is sun doing this explicitly to emphasise the advantages of RMI to promote its usage ? Do you see any particular reasons to use sockets ? In sockets case, synhcronization of requests should be taken care by the application, would that be inventing the wheel again ?
Joined: Jun 02, 2003
Hi Sathya and Ulvi, Great ! I feel less lonely now ! Please be patient, I won't be able to post more about it by late tonight (here in Brussels = +8h). But I promise you (and hope so) an in-depth technical discussion about sockets. Because you're right Ulvi : most of the rare posts where "sockets" is a keyword are just dedicated to the choice to be made between RMI and sockets, but nothing much more. And I found that so frustrating when I was struggling with some tecnical issues ! Best, Phil.
Joined: Jun 02, 2003
Hi everybody, Thank you all for entering this discussion. Funny introduction While finishing with the db part of the assignment (where I invested far to much time and energy IMO, with a global design which is all but simple though scalable and performant), I made a promise to myself : "For the next two parts (network and GUI), keep it simple Phil !". Well, unfortunately, I had decided to implement a socket solution before I made that promise, sockets are all but simple (but I didn't know in which extent), and finally I am too much faithful to my own decisions (but not to my promises). The Design Choice Issue Yes, let's take it from the begining. In my instructions (URLyBird 1.2.1) it is stated :
Network Communication Approach You have a choice regarding the network connection protocol. You must use either serialized objects over a simple socket connection, or RMI. Both options are equally acceptable.
and (under "Packaging of Submissions") :
A file called choices.txt that containing pure ASCII (not a word processor format) text describing the significant design choices you made. Detail the problems you perceived, the issues surrounding them, your value judgments, and the decisions that you made.
It means that you cannot choose one solution against the other without justify your choice. As so many people here, I read Max's book (and for the others, here is the best investment you can make as far as SCJD preparation is concerned). According to his book, the pros of sockets are performance and scalability. I will add this one : it's a standard, I mean an open standard, while RMI is "just" a java standard. Indirectly, that's what Max writes too by telling "... sockets are well suited for sending data, often in compressed form, ...". "Compressed form" : you send and receive whatever you want as far as both sides of the connection agree with. To be honest, let's have a look to the cons of sockets (which are opposite of RMI pros) : more complex to implement (low-level, no "network transparency", the need to build a multi-threaded server yourself). Here comes the issue. We are not talking about what's the weather like, but about a design choice : the one which answers this simple question : "Why did you choose sockets over RMI ?". And here flys away the design simplicity you had in mind despite your promises, because you simply cannot claim "I chose sockets over RMI for performance, scalability and openness considerations" while coming with a slow, not scalable and close solution. Sockets may be simple : Ephemeral connections / One thread per connection Server-side you just need a ServerSocket accepting on a given port. When a client comes in, it creates a Thread, passes the new connection socket to it, and goes on accepting new connections. That Thread's job is simple too : read from its socket InputStream, interpret what's beeing read as some Command (an abstract "executable thing"), get the result, send it back to the same socket through its OutputStream, close the socket and die. Thanks to java serialization, even the "marshalling" is simple : you get commands by a simple readObject() and send results by a simple writeObject(). Unfortunately, that implementation is not performant, not scalable and close : Not performant because each time some request needs to be sent to the server, a network connection is open and then closed, which is very time consuming. To execute the command, a new thread is allocated which is time consuming too. Not scalable because threads are a rare resource on any system : allocating a thread for each connection automatically limits the number of concurrent connections. And finally close, because the only marshalling protocol that basic implementation supports is java serialization : if you want later connect to your application server with something else than a java application, you'll get in big troubles. Or a little more complex : Permanent connections / Pool of threads Permanent connections : Once accepted by the server (a client connects at start time), a given connection stays open during the whole client's life (except if the connection is broken for any reason, in which case some reconnection must happen). Pool of threads : We have threads created and started, which never die while the server is running. In my implementation, I called them Handlers, there is number of them created from start (property), and a maximum number (property too). Incoming connections are put in a queue, automatically allocated to some handler which ... handles it and put it back in the queue (if nothing "bad" happened in the meantime from the network point of view). In theory, it is not that much more complex BTW, and far more performant and scalable. More performant, because threads stay alive as well as network connections. More scalable, because you may have much more "concurrent" connections than you have threads running. But in practice, I noticed that it is difficult to achieve : when you allocate existing connections to threads in some FIFO order, it doesn't make sense for a given handler to wait some time on a given connection, maybe just to notice that it had nothing to do. So I set the timeout of the client socket server-side to its minimum value : setSOTimeout(1). Here came the biggest issue I had to solve : a SocketTimeoutException may be thrown when there is nothing to read from the socket and the timeout expired (I understand that ), but sometimes it happens that it's thrown in the middle of a read (!). At first sight, it just seems funny, but let's say that the client sent some request object serialized in a 3447 bytes stream. How do you think your server-side ObjectInputStream.readObject() reacts when it is interrupted after reading just 845 of them ? I can tell you : it hates that. At best you get an EOFException, but it can be a StreamCorruptedException either. I wrote "at best", just because an EOFException is more understandable. But in any case it's an unrecoverable error. Fortunately, the solution I found was just what I needed to achieve a design decision I had in mind from the beginning : decouple the marshalling process from the communication layer. Before I come back to that issue, just a few words on the latter : Sockets are a standard, while serialized objects over socket connections is a pure java-to-java solution : sockets themselves know only the byte streams they may receive and/or send. Objects of any kind must be packaged in some way that both parties (server and client) understand. That's the marshalling process and java object serialization is just one of them. If you abstract the marshalling, you get a much more "open" system, open to the outside world, but even within the java one : Open to the world : as far as client and server may interpret a given bytes stream with some common protocol, they can communicate over sockets (a bad example of this would be some (legacy) asp application querying your (new) java application server) . Open to java : let's say that among your 10 CSR's, one of them negociated to be allowed to work from home remotely. If performance considerations need it, and if your application server supports multiple marshalling schemes (as I did), it's easy to add one more to support those remote clients : compressed serialized objects. Back to the SocketTimeoutException issue : the solution simply consists to read from the socket InputStream as many bytes as you can before getting interrupted, and put them in some buffer (back to this buffer soon). If 0 bytes are read, you are done. If more are read, just delegate to some ObjectConverter (back to it soon too), trying to interpret them and throwing an IncompleteObjectException in case the bytes stream is incomplete. When it arrises (very rare), you just need to read more bytes, till the object is complete. Of course, if an IOException of some sort is thrown in the middle, the process stops, the connection is lost and the client gets a SocketException. I have 2 things to tell about that buffer :
It must be "resizable" : While reading from the socket InputStream, you have no way to know how much data you'll get. So if the buffer gets full, you may call its increaseCapacity() method (current size is increased by some increaseCapacityFactor). And when writing an object representation to it (through an OutputStream of my own), you may call its ensureCapacity() method to make sure your object fits in.
Thanks to a design mistake , I decided to implement it "softly", I mean through a soft reference. Let me explain : initially, such a buffer was owned by my ClientConnection class (bad design). And I thought : it would be stupid to see your server running out of memory because of buffers owned by "dormant" connections. Now that they are owned by the connection handlers, it is of little interest to make them "soft", but as they are I kept that so called SoftResizableBuffer class. How does it works ? The buffer is privately stored as a SoftReference. The garbage collector is allowed to clear them before running out of memory, as far as there is no more strong reference to them. Its public method getBuffer() returns it if it was not cleared (probably to be stored in some normal "strong" reference, preventing the garbage collector to clear it), or allocates a new one to its initial capacity if it has been cleared. Two additional methods (fix() / unfix()) allow a process to prevent temporarily the buffer to be cleared without needing to store (and pass along to other methods) the strong reference got by getBuffer(). Now thanks to a bug (handlers were created till their maximum number even when they weren't needed to serve existing connections) (there are so many thanks to mistakes in this paragraph ), I saw SoftResizableBuffer at real work : some of them were reallocated sometimes to there initial capacity, growing as needed, then cleared and reallocated, etc. SoftReferences are magic ! After correcting that bug, I wondered if it wouldn't be better to simplify it (making it a simple ResizableBuffer). My conclusion is that that "soft" behaviour is still interesting, to smooth memory peaks. Let's say that such a buffer has an initial capacity of 32Kb. Now some handler needs to handle a huge query result (1Mb ?). No problem, the buffer grows. But after having sent the result back to the client, what to do with that "huge" buffer ? Deallocate it ? It would be a pity because the next connection to handle may need such a huge buffer too. Keep it ? If other handlers need such a huge buffer too, we risk the fatal OutOfMemoryError. Just keeping "soft" is clearly the best solution.
Mmh, this post is huge already, it's late here, and I have still a few things to tell you about (if still interested ) :
Abstract marshalling : how class SocketObjectReaderWriter, interface ObjectConverter and classes ObjectConvertersFactory and ObjectSerialConverter work together in order to support multiple marshalling protocols at the same time.
The Command pattern as implemented in my solution (no switch server-side)
How sessions (objects which maintain the "state" of a connection) are handled abstractly. An example of a useful Session object that a permanent network connection may be bound to would be a database connection.
The two-way communication : how optional callbacks are supported in a simple way, limited in my implementation to server-side messages broadcastable to clients
The hand-shake process : how clients and server agree (or not) on some protocol.
No indigestion yet ? Best, Phil. [ September 09, 2003: Message edited by: Philippe Maquet ]
For my implementation, I used both sockets and RMI (that is, I designed the application to make the networking layer modular). By doing so, I could change connections from the GUI itself from RMI, Local, or Sockets.
Joined: Jun 02, 2003
Hi Tarun, Waouw ! How did you do that ? And especially the sockets part ? Do you mean that your server is an RMI server and serves objects over sockets in the same time ? Best, Phil.
Joined: Jun 02, 2003
Hi everybody, My sockets implementation (continued) Here are the few additional things I wanted to talk you about. This time, I'll try to be brief ! Abstract marshalling Server-side, every ClientConnection owns a SocketObjectReaderWriter instance which in turn owns an ObjectConverter. ObjectConverter is an interface :
able to constructs / write an Object from / to a bytes buffer. During the hand-shake process, SocketObjectReaderWriter receives a byte in its construcor identifying the marshalling protocol. Its ObjectConverter is then constructed (or simply retrieved if such an instance exists already) by a call to the ObjectConvertersFactory.getConverter(byte formatByte) throws UnsupportedFormatException. ObjectConvertersFactory is a singleton. The only ObjectConverter implemented is a class I called ObjectSerialConverter (marshalling done by serialization). Notice that it would be easy to support additional formats by reserving other format byte values, extending ObjectConvertersFactory and implement additional ObjectConverters. Client-side, it's simpler. As it must support only one format at a time, there is no need for a factory. A second SocketObjectReaderWriter constructor accepts a pre-constructed ObjectConverter, which in my URLyBird implementation is a ObjectSerialConverter instance. Sessions As a reminder, a Session object is bound to a ClientConnection to store any state information belonging to that connection. If a connection doesn't need a Session, it remains null. Session in an interface :
During the hand-shake process, a Session class name may be sent by the client, in which case the server SessionFactory instantiates a Session instance by name and binds it to the connection. The first time a command is sent by the client, the Session is open(). And when the connection closes for any reason, Session.close() is called. You will typically implement Session in a class where you get a Database connection in open() and release it in close() if needed. If you don't need such a permanent database connection, it's still possible to do nothing in open() / close() and get the job done from your Command (the Session object is passed to its execute() command as shown below). Commands and CommandResults Both are interfaces :
In the business tier, you need to create one class implementing Command by request type (a BookCommand for example) and one class implementing CommandResult by possible result type. Here is an example of an UppercaseCommand :
Server-side it's very simple : when a connection is handled and a Command is to be processed, those few lines of code do the job :
Client-side it's quite simple too : after having instantiated the right Command, it is sent to server and the result retrieved by Connection method "public CommandResult executeCommand(Command cmd) throws ApplicationException" OK, I am nearly done with the whole design. As I promised to be brief, I'll postpone the few next points in a next (and last) post :
Two-way communication / Optional callbacks
The two main classes (singleton ConnectionsManager server-side and Connection client-side)
Best, Phil. [ September 09, 2003: Message edited by: Philippe Maquet ]
author and jackaroo
[Philippe] sockets [are] more complex to implement [than RMI]
Well, yes and no. I just created a simple remote application using both RMI and sockets. The sockets version was actually a little more complex than it had to be, because I used a Command pattern bean (17 lines of code) to pass instructions to the server. The socket server itself was 40 lines long. The RMI server was 39 lines long. The Socket client was 22 lines long, the RMI client was 16 lines long. So for something very simple, we are not talking about writing major amounts of code in one and not the other. And I would argue that understanding the socket code is no harder than understanding the RMI code. Now as you mentioned, there are other issues you have to contend with for sockets, including setting time outs to reasonable values and so on. But how many of these to you really have to contend with for this assignment? We keep going down this path of designing a much more sophisticated solution than we really need. For both sockets and RMI, the client would have the connectivity abstracted, so the client code itself would be no more complex. And for sockets you don't have to call rmic, and you don't have to ship the stub files (or make them available for dynamic downloading with it's associated issues with security and setting the codebase). When you consider these, you may find that the junior programmer has more chances of breaking something.
serialized objects over socket connections is a pure java-to-java solution
Sun have done everything but make it an open standard. You can go to their web site and download all the specifications you need in order to work out how to serialize / deserialize an object in any language. Doing this could actually be easier than implementing a custom protocol. ----- I think that making your solution provide a pool of threads has made your solution much more complex than it needs to be. If you took that out, so that you now have permanent connections, and a timeout of say 1 second (not 1 millisecond like you have), wouldn't you be able to remove half the issues you mention above? ----- You haven't really mentioned any downsides to using RMI. So here goes... With RMI you always have an extra process running (the RMI server). It always keeps one socket open (with unknown numbers of threads) to listen for binding requests and client lookups. Most system administrators are already familiar with what needs to be done to implement a server that listens on a particular socket. Even if they need to go through a firewall. How many do you think are familiar with how to do it with RMI? Yes, RMI can work through a firewall, but it takes extra work. To do the the same, very simple task using either RMI or sockets (using the programs I mentioned earlier):
Total network traffic for sockets: 472 bytes
Total network traffic to bind and lookup on rmiregistry: 1467 bytes
Total network traffic for RMi client to get remote factory instance 1812 bytes.
Total network traffic for RMI remote procedure call: 633 bytes
This is pretty much pure network issues - the programs I wrote did not pass any parameters nor did they receive any results back. I will think about this some more and add more later. Regards, Andrew [ September 09, 2003: Message edited by: Andrew Monkhouse ] [ September 09, 2003: Message edited by: Andrew Monkhouse ]
Joined: Aug 05, 2003
Oh my god Philippe ! First of all I am happy that you are really approaching the whole thing in a nice methodological way. I really like that, but my initial concern still is there, how did u make the decision between using sockets+serialization and RMI. In the first half of your explanation, you are mentioning about the cons and pros of both decisions. Sockets w/ serialization (+) performance (+) scalibility (+?) standart (sockets are serialization isn't) (-) diiifficult to implement when it is multi-threaded RMI (+) easy to implement, multi-threading is included (-) less performent (-) java specific And you somehow find yourself in sockets/buffers/threads/exceptions/comms layer/marshalling. If your decision is not due to the weather or because you just feel like it (I am not sarcastic at all, I feel the same towards RMI guys, too), can you please tell me what made you jump over to sockets ? I must admit, it must be a hell of a material you go through, very good in terms of grasping a lot of difficult concepts but the design decision must depend on stg. else. So what is your reason ? Ulvi
Joined: Jun 02, 2003
Hi Andrew, Thanks a lot for your reply.
quote: -------------------------------------------------------------------------------- [Philippe] sockets [are] more complex to implement [than RMI] -------------------------------------------------------------------------------- Well, yes and no.
That's what I understood from what Max writes in his book on p. 317. But you are right, it may be simple.
And I would argue that understanding the socket code is no harder than understanding the RMI code.
Yes, and paradoxally, because sockets are more low-level. If your implementation is well-designed (hopefully mine is), you have a few classes which naturally fit together and whose name is explicit enough to immediately understand which part of the job they do. Moreover, most of those classes are "hidden" within the network framework. From the application-writer point of view who is using the framework I am describing in a simple way (single marshalling protocol), the only classes/interfaces they must know something about are : Server-side : ConnectionsManager Client-side : Connection Both sides : Session, Command and CommandResult On the contrary, RMI is a high-level technology : so you need to know about its registry, rmic, you get generated pieces of code, and generally speaking a quite complex API. Just one example : how many posts did we read the last six months just about how to use RMI Unreferenced interface just to know that a client crashed ?! In comparison, in my simple sockets implementation, there is no such worries when a client crashes : the server will be aware of it within the next few milliseconds, close the session and smothly loose the connection.
Now as you mentioned, there are other issues you have to contend with for sockets, including setting time outs to reasonable values and so on. But how many of these to you really have to contend with for this assignment? We keep going down this path of designing a much more sophisticated solution than we really need.
Right again. It probably comes from my (bad ?) tendency of trying to design and code with some possible reusability in mind when it seems to me reasonable to do so. In the real world, if you have to design some network solution from scratch for a given application, it's highly probable you'll have to reuse some comparable solution for the next application to be written. So your choice is this one : or you design a fine-tunable solution which may work well in different use contexts (a few concurrent connections in application_1 / many concurrent connections in application_2 with a few of them running remotely) and you have more work in the begining but only one code base to maintain and debug, or you write simpler but different solutions you'll have to maintain separately.
Sun have done everything but make it an open standard. You can go to their web site and download all the specifications you need in order to work out how to serialize / deserialize an object in any language. Doing this could actually be easier than implementing a custom protocol.
Nice ! I'll have a look to it. If it's easier and really open, it would mean that abstracting the protocol is of less interest. I'll check that.
I think that making your solution provide a pool of threads has made your solution much more complex than it needs to be. If you took that out, so that you now have permanent connections, and a timeout of say 1 second (not 1 millisecond like you have), wouldn't you be able to remove half the issues you mention above?
I understand what you mean : one handler (thread) allocated per (and dedicated to each) permanent client connection, so that a small timeout isn't required anymore ... and I would be rid off all issues envolved by a SocketTimeoutException thrown in the middle of a read. Two remarks about it :
The pool of threads solution is still more scalable (and hence reusable). Because for a given number of concurrent connections, the optimal number of running threads will depend on the machine you're running on (number of processors), and its OS (see this article about threads on Linux). So if the maximum number of handlers may be set independantly of the maximum number of supported connections, you get not only a more scalable solution, but a more portable one.
I could reproduce the situation of a timeout got in the middle of a read with a 1 millisecond timeout. You are probably right that the highest that timeout value is, the least will be the risk to get interrupted while reading. But how to validate it ? Let's say that if with timeout=1 I encounter the problem every 1/1000 calls, with timeout=1000 will it be never, or 1/1000000 ? As my read solution is 100% safe, would I need to change it ? But you make me think of a little optimization (not sure I'll implement it for simplicity - I feel you smiling ) : each time a connection is handled, the handler would take this decision : in the case there are as many handlers as connections, it could keep its connection and increase its timeout reasonably, instead of reenqueue it, where it will undoubtedly be dequeued immediately by another handler. Else keep (or reset) the timeout to 1 and reenqueue the connection. I see two main advantages of this solution : potentially less enqueue/dequeue/notify operations and less interruptions while reading which are costly when they happen though rare (misinterpret the buffer, throw an IncompleteObjectException and loop back to read the rest of the object).
You haven't really mentioned any downsides to using RMI. So here goes...
Thanks for those additional arguments in favor of sockets. BTW, I didn't mention here yet another way Max evokes to speed up serialization over sockets : make use of the Externalizable interface.
I will think about this some more and add more later.
Thanks to keep this discussion up and running. I find it very interesting. Best, Phil. [ September 09, 2003: Message edited by: Philippe Maquet ]
Joined: Jun 02, 2003
Hi ulvi, Thanks for your comments.
Sockets w/ serialization (+) performance (+) scalibility (+?) standart (sockets are serialization isn't) (-) diiifficult to implement when it is multi-threaded RMI (+) easy to implement, multi-threading is included (-) less performent (-) java specific
Remember they are the pros and cons I retained from Max's book and made mine before designing. As I wrote in my introduction, my design is a little complex because I wanted (and needed) to be consistent with my design choice : telling that you chose sockets over RMI for performance and scalability considerations force you to come with a performant and scalable solution. But reading Andrew's post, I noticed that it's possible to justify a sockets choice quite differently, telling for example "I chose sockets over RMI for simplicity, code easier to be understood by junior programers, and lightweight system administration". With such a justification, you may (or even must) come with a simpler design than mine.
(+?) standart (sockets are serialization isn't)
Sockets are not serialization : Sockets abstract the end-points of a TCP/IP communication (standard), while serialization (in the network field) is just one java way of object marshalling.
(-) diiifficult to implement when it is multi-threaded
A server based on sockets is always multithreaded (anyway if you want to be able to serve more than one client at a time, which is the common job of any server).
can you please tell me what made you jump over to sockets ?
Really and honestly ? Just the desire of learning a technology I had no previous experience in before. I could say the same about RMI, but sockets involve more different interesting things to handle IMO. Some people here seem to systematically avoid all technical difficulties they encounter when they can do so : the simplest the best. Other people on the contrary seem to systematically dig into them as if they were masochistic. Actually, I think that both approaches, though opposite, are very defendable. It really depends on your personal context. If you are an experienced java developer, working for a few years in the java world, and simply willing to get SCJD certified, your aim will probably to get it at "low cost" (time and energy). If, like me, you are an experienced developer/coach in other languages / environments, but a complete newbie in java, and you decided to move to java and get certified, your goal is quite different : you want to get certified and experienced. And by own experience, I know there is no miracle out there : your experience grows as you solve problems, not by circumventing them. At 43, it's (unfortunately ! far too late for me to get hired as a junior. There are many java open positions in Belgium for the moment (that's obviously why I decided to move to java), but all senior level ones require - statement of the obvious too - strong previous experience. I hope this explains that. Best, Phil. [ September 09, 2003: Message edited by: Philippe Maquet ]