File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
Win a copy of Clojure in Action this week in the Clojure forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Not getting performance with MapReduce

 
Priyanka Suresh Shinde
Greenhorn
Posts: 2
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am working on hadoop mapreduce to get performance benefit but when I run my program on hadoop it takes about 37 minutes where as it takes only about 5 minutes for simple C++ program for doing the same task.
 
Jayesh A Lalwani
Rancher
Pie
Posts: 2756
32
Eclipse IDE Spring Tomcat Server
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Please TellTheDetails. What is your application doing? Where is it spending more time?
 
Priyanka Suresh Shinde
Greenhorn
Posts: 2
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The input file contains the number of records, one per line. I have written one simple program to print those lines in which three words are common. In map function i have passed the word as a key and record as a value and compared those records in reduce function.
 
Martin Vajsar
Sheriff
Pie
Posts: 3747
62
Chrome Netbeans IDE Oracle
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Welcome to the Ranch, Priyanka!

Parallel processing is not a silver bullet that will instantly turn every program to run x times faster. It adds a lot of overhead for creating all the threads, distributing work to them and then getting the results back and aggregating them again. If I understand your description right, there isn't any actual processing - your workers do nothing.

Imagine you need to do a project that will take a man-year of work. You can do it yourself in a year, or you can hire ten developers, distribute the work among them, manage them and deliver the project in, perhaps, three months. You might expect the project to be finished in five or six weeks, given that there are now ten people working on it, but it won't be the case. The developers won't spend all the time coding, they will need to meet and coordinate their work, which isn't needed if just one person does the work.

And now imagine that you'd hire ten developers to write a 20 lines "Hello, world!" application. They'll probably spend much, much more time doing so than if you whipped up the program yourself. Every one of them would in theory write just two lines of code, but the overhead of coordinating their work in this case is so big that it exceeds several times any benefit from having multiple people working on it.

Your program is similar - individual workers have very little work to do, but the amount of work needed to coordinate them is the same as if they worked hard. This simple program won't work well with Hadoop. Only programs that do substantial amount of work other than the Map and Reduce functions can experience any speedup at all. Hadoop is best suited for cases where you can distribute a lot of work among a lot of workers.
 
Saurabh Rana
Greenhorn
Posts: 7
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
How many rows are you trying to process? What are the details of your cluster?how many nodes?
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic