I'm building a multi-threaded Java application which in simple terms reads data from the database, does some processing and writes it into a file. We're planning to distribute among the 3 UNIX servers which we have. Just need some inputs on any java frameworks which would be best for this.
Question: Are you absolutely sure that you need to distribute the processing workload?
Out of those three tasks, reading from a DB, doing some processing, and writing to a file, I wouldn't immediately consider that the processing part is going to be the bottleneck in the whole operation. Reading from a DB is slow and writing to a file is really slow. What you need to do here is to get it working on a single machine and then profile the performance of the operation to find out, for a fact, where the bottleneck is. Only then can you know for sure what you can do to speed things up.
Question: Are you sure Java is the right tool for the job?
If the processing part is just some text manipulation then I might recommend you use a scripting language to do that task. Languages like Groovy or Python might be better suited. Particularly Groovy if you're used to writing Java as the syntax is pretty similar.
Tim Driven Development
Joined: Jun 17, 2014
Hi Tim, Thanks for the response.
We're currently having this process in Perl but the amount of data to be processed is growing (currently in millions) so we thought instead of reading from the database sequentially, we can read in parallel and process the data.
Our idea of distributing this process is to evenly balance the load on our server as some of our servers are under utilized.
Assuming you've done all your profiling and have concluded that the processing part is slowing you down then I'm afraid this is where my usefulness ends. I'm not that familiar with writing distributed java apps and haven't used any frameworks for it either.
Joined: Jun 17, 2014
Thanks Tim. Anybody has any thoughts on this. Please let me know.
Since Sun's motto was "the network is the computer," Java has plenty of resources for network computing - personally I found the JavaSpaces concept rather attractive. Search for "Javaspaces open source" or "Gigaspaces"
Another approach might use JMS - Java Message Service.
Joined: Mar 22, 2005
I found the JavaSpaces concept rather attractive.
These days, the Jini and JavaSpaces projects are carried on as part of Apache River.
Java's built in RMI does this well out of the box. If you want to add more features, there's a bunch of stuff for dynamic service discovery, distributed event handling, and even transactions (and actually "Java Spaces" formerly mentioned in this thread) in what is now the Apache River project. It used to be a Sun project called Jini. Both RMI and Jini are what I like to describe as "real distributed object-oriented" By that I mean you can pass polymorphic arguments across calls, passing objects of classes never previously seen by the recipient (and yes, the security manager must be in place, so the newly introduced code can be prevented from running amok) You can have instances of distributed/remotely accessible objects created dynamically (not just "this is the server") and basically _everything_ that is normal for OO in a single process space.
You need to try Apache Storm, to do parallel processing, the other option you have it is to write a good java app and it need to work with a lot of multi threading on it, using Executor interface, but if i were you i try Apache Storm.