• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Not getting performance with MapReduce

 
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I am working on hadoop mapreduce to get performance benefit but when I run my program on hadoop it takes about 37 minutes where as it takes only about 5 minutes for simple C++ program for doing the same task.
 
Rancher
Posts: 2759
32
Eclipse IDE Spring Tomcat Server
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Please TellTheDetails. What is your application doing? Where is it spending more time?
 
Priyanka Suresh Shinde
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The input file contains the number of records, one per line. I have written one simple program to print those lines in which three words are common. In map function i have passed the word as a key and record as a value and compared those records in reduce function.
 
Sheriff
Posts: 3837
66
Netbeans IDE Oracle Firefox Browser
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Welcome to the Ranch, Priyanka!

Parallel processing is not a silver bullet that will instantly turn every program to run x times faster. It adds a lot of overhead for creating all the threads, distributing work to them and then getting the results back and aggregating them again. If I understand your description right, there isn't any actual processing - your workers do nothing.

Imagine you need to do a project that will take a man-year of work. You can do it yourself in a year, or you can hire ten developers, distribute the work among them, manage them and deliver the project in, perhaps, three months. You might expect the project to be finished in five or six weeks, given that there are now ten people working on it, but it won't be the case. The developers won't spend all the time coding, they will need to meet and coordinate their work, which isn't needed if just one person does the work.

And now imagine that you'd hire ten developers to write a 20 lines "Hello, world!" application. They'll probably spend much, much more time doing so than if you whipped up the program yourself. Every one of them would in theory write just two lines of code, but the overhead of coordinating their work in this case is so big that it exceeds several times any benefit from having multiple people working on it.

Your program is similar - individual workers have very little work to do, but the amount of work needed to coordinate them is the same as if they worked hard. This simple program won't work well with Hadoop. Only programs that do substantial amount of work other than the Map and Reduce functions can experience any speedup at all. Hadoop is best suited for cases where you can distribute a lot of work among a lot of workers.
 
Greenhorn
Posts: 12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
How many rows are you trying to process? What are the details of your cluster?how many nodes?
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
reply
    Bookmark Topic Watch Topic
  • New Topic