my dog learned polymorphism*
The moose likes Hadoop and the fly likes candidate for map reduce Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Android Security Essentials Live Lessons this week in the Android forum!
JavaRanch » Java Forums » Databases » Hadoop
Bookmark "candidate for map reduce" Watch "candidate for map reduce" New topic
Author

candidate for map reduce

Mohit Sinha
Ranch Hand

Joined: Nov 29, 2004
Posts: 125
Hi

I wanted to know your thoughts on the same. Recently I was going through the open source map reduce framework called Hadoop. I currently have a standalone java application. The target users of this app would invoke providing input details by some front end mechanism which our client will develop. Our part was to just develop the java app.
Our java app basically does some mathematical calculations after interacting with the database using an OR mapping framework called Hibernate.
Now there is a new requirement that our application will have another invocation mechanism (batch jobs). There will be a file which will have 1000 or more invocations and we have to execute the above mentioned application that many number of times. If I go by the current batch process it will call the application in a sequential order.
Recently i read about map-reduce style open source implementation called Hadoop. Which basically divides the tasks into some pre-defined size and then spawns thread to execute the divided number of tasks.

Do you think map reduce style of execution can expedite the batch process job. First I would like to know if map reduce is a feasible solution for this sort of problem.

Mohit
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41137
    
  45
It sounds like the individual invocations are independent of each other; is that the case? If so, Hadoop (or MapReduce) wouldn't add anything. You could just schedule the invocations on different machines and be done with it.


Ping & DNS - my free Android networking tools app
Mohit Sinha
Ranch Hand

Joined: Nov 29, 2004
Posts: 125
Yes your understanding is correct. each task in the batch job is a distinct one and the tasks are not interdependent. You suggested about running the job over different machines but if that multiple machine option is not available can we achieve the same using java threading.
Any insight on the same would be helpful
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41137
    
  45
You may see a speedup by using multiple threads or you may not; it depends a great deal on the problem at hand. If it's pure computation -with little or no I/O interspersed- it's unlikely to become noticeably faster through multithreading.
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: candidate for map reduce
 
Similar Threads
Hadoop in the cloud
Hadoop
Question about using Multi-threading
Beyond Hello World
advanced technology than RMI for developing Distributed applications?