We are in a process of modifying and optimizing a project. The existing project is as follows.
After the user clicks the start button, the data is collected from database and there are many calculations done. For e.g assume its like 10 steps and each step can take around 1 hour or little less depending on the data. As of now the calculations are done sequentially which results in waste of time.
Also out of 10 processes, many of them are independent(i.e. they do not need the previous calculation results). So we were discussing if we can use multi-threading for the calculations, like Thread 1 will do process 1, and in the same time Thread 2 will do Process 2 and so on. Will this be feasible solution and will this improve performance.
We have just talked a bout it and are in the process of testing this idea before implementing. But due to the time constraints, I just wanted to check the experts advice.
Thanks Edward for the info. Though I would like to push the processes into other machines, I think we need to make it work in single machine only. Even then I feel it should work, since I think one process is not fully utilizing the CPU time fully.
But surely I would look into the link you provided.
On a single processor, multiple threads can do the work in less elapsed time than a single thread only if that single thread spends time waiting on something, typically disk or network IO. If the process is CPU bound, multi threads on one CPU won't help much. On more CPUs or multi machines there is surely some opportunity.
Distributing work and gathering results sounds really interesting. I wonder if you could build a communications protocol between threads that would work local and scale up to work between processes as well, eg something based on sockets. I'm sure somebody has done this out in open source land; whether that is something you'd want to use or not might be a different question.
I was just reading some Google papers on their map-reduce algorithms. The Lucene project has spun off Hadoop which is an open source map-reduce. I don't think it applies to your problem but the design may give you some ideas.
Are you sure the program will always run on a single processor box (maybe it is embedded?) If yes, it may not worth the development time. However, if you are running it on a common PC, I think it is worth the time -- if you buy a new machine it will most likely be at least double-core. That's the direction the industry is taking, so I guess multi-threading will soon be a second nature to developers like us
Check your not doing a lot of reading and writing to the database as well. Often if the work is database intensive the best optimisations are done on it i.e. on an Oracle database for example the best optimisations are running as much processing native to the database e.g. java callable PL/SQL on the databse, using a database native JVM, table optimisations, I recently had to optimise a very parallel system (20 + java threads) on a 4 CPU machine but the real speed ups came in optimising what I did on the database as the threads were at first pulling data out processing it putting it back reading some more etc. Also remember you only have one database resource (unless its read only ops and you can duplicate it :-) ) so this can be a thread bottle neck. Check you never use database locks in any process that could lock out a thread.
If you only do a small one off database read, database transfer is quick and you have only one CPU thats basically utilised 100 per cent by each calculation as it runs then you could actually end up with slower code due to the over head of threads, also how much memory does each process require relative to the total system memory. Have a look at your CPU utilisation as your one thread version runs, if its near 100 per cent and you only have one CPU optimise your algorithm instead.
' Optimise in haste repent at lesuire '
"Eagles may soar but weasels don't get sucked into jet engines" SCJP 1.6, SCWCD 1.4, SCJD 1.5,SCBCD 5
As an alternative to sockets, consider Remote Method Invocation (RMI). RMI allows two computers both running Java to exchange objects and call methods of the other computer.
For example, if you made a query to a database server that was an RMI server, the result set could remain on the server and the client would get back just a reference to the object that remains on the server. This could save much data going across your network.
Joined: Feb 17, 2005
Thanks for all the replies guys.
Actually I was thinking to implement multi-threading using java without any external tools or open sources. Will it be a problem that way. But I would try that first and then go to the tools you guys mentioned.
I was also told that the Server might be a dual core. So may be it will work if implemented properly.
Also its a good point made by Chris about thread bottle-neck. We have to discuss that and make sure its not a problem.
Thanks guys, I will try it out using java multi-threadin first and let you know how it goes.