aspose file tools*
The moose likes Hadoop and the fly likes candidate for map reduce Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Databases » Hadoop
Bookmark "candidate for map reduce" Watch "candidate for map reduce" New topic
Author

candidate for map reduce

Mohit Sinha
Ranch Hand

Joined: Nov 29, 2004
Posts: 125
Hi

I wanted to know your thoughts on the same. Recently I was going through the open source map reduce framework called Hadoop. I currently have a standalone java application. The target users of this app would invoke providing input details by some front end mechanism which our client will develop. Our part was to just develop the java app.
Our java app basically does some mathematical calculations after interacting with the database using an OR mapping framework called Hibernate.
Now there is a new requirement that our application will have another invocation mechanism (batch jobs). There will be a file which will have 1000 or more invocations and we have to execute the above mentioned application that many number of times. If I go by the current batch process it will call the application in a sequential order.
Recently i read about map-reduce style open source implementation called Hadoop. Which basically divides the tasks into some pre-defined size and then spawns thread to execute the divided number of tasks.

Do you think map reduce style of execution can expedite the batch process job. First I would like to know if map reduce is a feasible solution for this sort of problem.

Mohit
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42951
    
  72
It sounds like the individual invocations are independent of each other; is that the case? If so, Hadoop (or MapReduce) wouldn't add anything. You could just schedule the invocations on different machines and be done with it.
Mohit Sinha
Ranch Hand

Joined: Nov 29, 2004
Posts: 125
Yes your understanding is correct. each task in the batch job is a distinct one and the tasks are not interdependent. You suggested about running the job over different machines but if that multiple machine option is not available can we achieve the same using java threading.
Any insight on the same would be helpful
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42951
    
  72
You may see a speedup by using multiple threads or you may not; it depends a great deal on the problem at hand. If it's pure computation -with little or no I/O interspersed- it's unlikely to become noticeably faster through multithreading.
 
Gartner says :Bigdata will be most advanced analytics products by 2015 !

Time to Become Big data architect by learning Hadoop(Developer, Administration,Analyst,QA),Cassandra,MongoDb,HBase,Datascience, Mahout, Splunk,R etc) from scratch to expert level

https://intellipaat.com/course-cat/big-data/?utm_source=coderanch%20&utm_medium=text&utm_campaign=coderanchdx1
 
subject: candidate for map reduce