Although I've taken several college classes I'm relatively new to java - as in I don't have a lot of real world experience. I'm writing a program for work (A rogue effort meant more for me than for them but it is something that is practical and gives me some experience) that requires the creation of millions of PDF files. I need some help deciding what would be the best design and the best Java toolset to use.
The current system uses a single desktop computer and takes about 2 weeks from start to finish. In addition, because there are 100's of gigabytes of data it takes about a day to transfer the files up to the network, another day to transfer the files up to our public facing web server, and then we have to copy the files to a firewire drive and ship them to the printer. This process is archaic (I didn't have anything to do with it) and causes a lot of stress for the department.
Create a Java server that will queue the records from the database to be picked up by clients. The server will also log information received from the client such as time to process the pdf, the URL of PDF and such.
Create a java client that will run on individual workstations, grab a record from the server, create the pdf file, and upload the file to some cloud storage.
Create a managemnet console that can be used to configure certain variables and start the process.
In doing this we can use a couple of hundred idle workstations to process the pdf files, which should save us a lot of time in the creation of the PDF. In addition having the clients copy the pdf files to cloud storage would reduce the number of copies. Our internal and external website can then just reference the URL's supplied by the client.
For the PDF creation I plan to use iText and am about 60% finished with my class that creates the PDF's.
What I'm struggling with is how to best setup the interaction/coordination between the clients and server. Should I use RMI? If not what would be best in this situation. I'm not copying files to the server from the client just data (record objects from database). Also what about coordinating the traffic between the all of the clients and the server? What would be the best way to handle that? The last thing is queing the records. Shoud I read all of the records into memory on the server or should i just keep a few hundred in the queue at any one time. I'm looking for "best practice" information.
I know this is a big undertaking for a beginner but you gotta learn somehow. I would very much appreciate your input/advice. Thanks!
Well I'm about 90% finished with this project. I don't know why I had no responses for my question regarding the varying approaches to this. But here is what I ended up doing. I wrote a server that listens for clients to connect. Once a client connects it a new thread is created and hands the socket info over to another class that I called the client manager. The client manager loads all of the records from the database into a static ConcurrentLinkedQueue. The clientmanager has an objectoutput stream that passes the record to the client objectinput stream. I also wrote a very simple protocol that gives the state of the client. I've run several tests. Each client can do 40000 pdf's in five minutes time. One of the other nice features is that it requeues a record if the client gets disconnected. In the current system they have to start from scratch. This is working really well. With thirteen clients we should be able to generate the PDF's in 5 minutes.
So the new problem is that I'm trying to make this nice and pretty with a GUI ineterface. So i have a form that starts and stops the server. The problem is that the when I stop the server the Concurrent linked queue is not releasing it's memory, so it uses 1.2 GB of RAM. If I then click on start server button it appends the records to the queue. Here's the pertanant code:
This probably has to do with a thread locking the queue or perhaps the queue is still being referenced from somewhere but I cannot see it. Can someone here offer advice? Thanks!