I wanted to know your thoughts on the same. Recently I was going through the open source map reduce framework called Hadoop. I currently have a standalone java application. The target users of this app would invoke providing input details by some front end mechanism which our client will develop. Our part was to just develop the java app. Our java app basically does some mathematical calculations after interacting with the database using an OR mapping framework called Hibernate. Now there is a new requirement that our application will have another invocation mechanism (batch jobs). There will be a file which will have 1000 or more invocations and we have to execute the above mentioned application that many number of times. If I go by the current batch process it will call the application in a sequential order. Recently i read about map-reduce style open source implementation called Hadoop. Which basically divides the tasks into some pre-defined size and then spawns thread to execute the divided number of tasks.
Do you think map reduce style of execution can expedite the batch process job. First I would like to know if map reduce is a feasible solution for this sort of problem.
It sounds like the individual invocations are independent of each other; is that the case? If so, Hadoop (or MapReduce) wouldn't add anything. You could just schedule the invocations on different machines and be done with it.
Yes your understanding is correct. each task in the batch job is a distinct one and the tasks are not interdependent. You suggested about running the job over different machines but if that multiple machine option is not available can we achieve the same using java threading. Any insight on the same would be helpful
Joined: Mar 22, 2005
You may see a speedup by using multiple threads or you may not; it depends a great deal on the problem at hand. If it's pure computation -with little or no I/O interspersed- it's unlikely to become noticeably faster through multithreading.