Although I've taken several college classes I'm relatively new to
java - as in I don't have a lot of real world experience. I'm writing a program for work (A rogue effort meant more for me than for them but it is something that is practical and gives me some experience) that requires the creation of millions of PDF files. I need some help deciding what would be the best design and the best Java toolset to use.
The current system uses a single desktop computer and takes about 2 weeks from start to finish. In addition, because there are 100's of gigabytes of data it takes about a day to transfer the files up to the network, another day to transfer the files up to our public facing web server, and then we have to copy the files to a firewire drive and ship them to the printer. This process is archaic (I didn't have anything to do with it) and causes a lot of stress for the department.
My idea:
Create a Java server that will queue the records from the database to be picked up by clients. The server will also log information received from the client such as time to process the pdf, the URL of PDF and such.
Create a java client that will run on individual workstations, grab a record from the server, create the pdf file, and upload the file to some cloud storage.
Create a managemnet console that can be used to configure certain variables and start the process.
In doing this we can use a couple of hundred idle workstations to process the pdf files, which should save us a lot of time in the creation of the PDF. In addition having the clients copy the pdf files to cloud storage would reduce the number of copies. Our internal and external website can then just reference the URL's supplied by the client.
For the PDF creation I plan to use iText and am about 60% finished with my class that creates the PDF's.
What I'm struggling with is how to best setup the interaction/coordination between the clients and server. Should I use RMI? If not what would be best in this situation. I'm not copying files to the server from the client just data (record objects from database). Also what about coordinating the traffic between the all of the clients and the server? What would be the best way to handle that? The last thing is queing the records. Shoud I read all of the records into memory on the server or should i just keep a few hundred in the queue at any one time. I'm looking for "best practice" information.
I know this is a big undertaking for a beginner but you gotta learn somehow. I would very much appreciate your input/advice. Thanks!