posted 25 years ago
I'll give this a try, too. I'll write as a general introduction -- feel free to skip over any bits you are already familiar with.
In a traditional computer program, execution starts at the beginning and runs until the program terminates, either by completing its task or by encountering an error. It runs on a single computer, and does one thing at a time. This is the way that programming is described in most introductory courses, and the style of most people's first programs.
However, this is not really a good match for the real world. many types of computer problem need to do more than one thing at once, or to be accessed by more than one person at once. Most modern word-processors for example, allow you to print one document while editing another. This doing more than one thing at once is known as concurrent programming, and is best done in Java using Threads.
Even this, however is only part of the story. If your program needs to do a lot of things at once, it can need a very large and powerful machine to do it. Such a machine is correspondingly expensive. 'Mainframe' computers like this can cost well over a million dollars each. In a world where many computers are connected together, either by the internet, by local networks, or by direct cable or modem connections, there is another option. If you can split the problem into small parts, each part can be run on a smaller, cheaper machine. To do this, however, these smaller machines need to be able communicate with each other -- giving instructions and passing data between them. This is 'Distributed Computing'.
Distributed computing takes many forms. Probably the most familiar is the World Wide Web. Each time you click a hyperlink on a web page, the browser software running on your machine makes a request to the server software running on another, the server then processes the request from your machine, and gives back some information to display. With a bulletin board such as this one, the processing on the server is even more complex, but the interface between the two machines is still the same. This is a key point.
Although the HTTP protocol shared by web browsers and web servers is a distributed system, it is not a very flexible one. The number of different requests is limited, and can't easily be extended. This is where systems like RPC, DCE, DCOM, RMI and CORBA come in.
One of the earliest widely-adopted distributed computing standards was Sun's RPC protocol. This allowed one computer (typically a Unix system running one or more C programs) to make 'procedure calls' to another machine in a similar manner to local procedure or function calls. Hence the name; RPC stands for Remote Procedure Call.
This standard was quite popular, and is still used in a lot of software running on Unix systems. One thing that RPC introduced was the concept of IDL, or Interface Definition Language. In order for both ends of the Remote Procedure Call to understand what is required, they have to share some information about what the procedure is called, and what the types and names of the parameters are. This is described by the IDL.
To write a program using RPC, you first write the IDL for a procedure and then compile it using a separate software tool into two parts a 'stub' for the system which calls the procedure, and a 'skeleton' for the system which provides the implementation. The 'stub' provides a local function which your software can call, and which takes care of all the low-level communication with the remote system. The 'skeleton' provides a similar service for the other end; it provides somewhere to put your code which will be invoked by the remote call. Once this has been done, just as in the HTML example, you can have as many different implementations of the RPC as you like, on as many machines as you like, called by as many different machines as you like, as long as each 'client' system uses the correct 'stub', and each 'server' system implements the correct 'skeleton'.
RPC is all well and good. It is much more flexible and powerful than the likes of HTML, but it still has a lot of problems. In order for a program on one machine to call a procedure on another, it must know about the other machine, and systems built with RPC are not very easy to alter by adding or removing machines or moving processing from one server to another while the system is running.
These problems were the drive behind DCE, the Distributed Computing Environment. DCE was an ambitions project begun at Digital Equipment Corp (DEC) and continued by the Open Software Foundation (OSF). It aimed to provide a similar IDL-based remote procedure call system, but one in which systems could be added and removed during the running of the system, and which would provide facilities for a client system to look up an appropriate server by the facilities it provides, rather than the name or address of the machine. It also adds extra levels of security, and ways for closely-coupled groups (or 'cells' in DCE terms) of machines to communicate with other groups.
DCE was popular for a while, but only really continues in legacy projects these days. It is notoriously difficult to administer as well as being large and expensive to install. Most new projects use one of the more modern systems, such as DCOM, CORBA or RMI.
These three protocols, and no doubt others, extend the basic RPC idea in a different direction; an Object Oriented direction. In these systems the client doesn't just make a procedure call on a remote machine, but invokes a method on a remote object. This is a subtle and powerful extension, and is much better at integrating the ideas of concurrent programming and distributed programming. With these systems you can create multiple objects on one or more server machines which support the same interface, and invoke their methods from any machine.
DCOM is the distributed (hence D) form of Microsoft's Component Object Model (COM) and is designed to allow invocation and use of COM objects between machines. CORBA (Common Object Request Broker Architecture) is a unified protocol which gathers together several other manufacturers attempts at object oriented distributed systems. It is a sufficiently broad standard to allow clients and server software to be written in a variaty of languages, and is very popular for adding a distributed layer to existing systems written in the likes of C or COBOL.
RMI (Remote Method Invocation) is simpler than CORBA, but is specific to Java.
I don't know much about DCOM, but in a Java context the main choice is between CORBA and RMI anyway.
In a CORBA system, each machine (client and server) runs an ORB (Object Request Broker) which mediates between machines. Each ORB may register itself, and the objects it manages, with other ORBs on the network. When a client wishes to call a method on a remote object, it calls its local ORB, which looks up which server(s) can provide an implementation of that object, and forwards the request to the ORB on that machine. The server ORB then invokes the appropriate object method. Thus CORBA may be described as decentralized. Each ORB knows about other ORBs. ORBs communicate with each other using IIOP (Inter ORB Communication Protocol).
RMI, on the other hand gains its cimplicity by having only one 'registry', shared between several machines. Any Java program which wishes to take part in RMI must register any remote objects with the registry, which then mediates between them. When a client wishes to call a method on a remote object it calls the registry which forwards the request to the appropriate object on the appropriate system.
RMI usually uses its own transport protocol, but as Paul mentions, most (but not all, yet) of RMI's features are available if RMI uses the CORBA IIOP protocol. This should also allow RMI registries to communicate with CORBA ORBs.
Useful books:
Java Distributed Computing (O'Reilly, ISBN: 1565922069)
Client/Server Programming with Java and CORBA (John Wiley & Sons, ISBN: 047124578X)
I hope some of the above helps,
Frank.