Win a copy of Svelte and Sapper in Action this week in the JavaScript forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Paul Clapham
  • Bear Bibeault
  • Junilu Lacar
Sheriffs:
  • Jeanne Boyarsky
  • Tim Cooke
  • Henry Wong
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • salvin francis
  • Frits Walraven
Bartenders:
  • Scott Selikoff
  • Piet Souris
  • Carey Brown

How to consolidate result of map-reduce

 
Bartender
Posts: 1558
5
Eclipse IDE Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

This might be a very basic question - but so far I'm not able to get any concrete direction.

Is there any facility in map-reduce to consolidate the output?
e.g. I have HBase cluster where I run a M-R job (actually, its just filtering from data - so I don't need reducer, it is just mapper). As of now, I'm writing this data to log file, but the problem is - since it is a cluster, the user (or process) has to browse through log files on various nodes (or hosts).

So - is it possible to do anything of below:
1) Populate a container from MR process and return the result to client.
2) Flush that data to a single resource (e.g. log file)

Due to business reasons, flushing the data to another HBase table is not possible.

Thanks in advance.
 
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am not very sure whether this solution helps you..

we can use context.write(key,value).. which will write the mapper results into output path. If you want the results to be written into some specific format, you can write you own customized calss for output format and you can use it in you context.write.

If you want a detailed answer, please describe the sample input and output and share your Mapper class.

Thanks,
Arumugarani

 
Anayonkar Shivalkar
Bartender
Posts: 1558
5
Eclipse IDE Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi arumugarani,

Thanks for the reply and Welcome to CodeRanch!

The problem with context.write is - as I mentioned - M-R process runs in a cluster (having multiple hosts and nodes).

So - as per my understanding - context.write will still write the data (or whatever we want to write) on file-system of each node (i.e. wherever M-R is running) - correct me if I'm wrong.

What I'm looking for is a way to consolidate all the output data at one place.

Thanks.
 
Surfs up space ponies, I'm making gravy without this lumpy, tiny ad:
Thread Boost feature
https://coderanch.com/t/674455/Thread-Boost-feature
    Bookmark Topic Watch Topic
  • New Topic