This might be a very basic question - but so far I'm not able to get any concrete direction.
Is there any facility in map-reduce to consolidate the output?
e.g. I have HBase cluster where I run a M-R job (actually, its just filtering from data - so I don't need reducer, it is just mapper). As of now, I'm writing this data to log file, but the problem is - since it is a cluster, the user (or process) has to browse through log files on various nodes (or hosts).
So - is it possible to do anything of below:
1) Populate a container from MR process and return the result to client.
2) Flush that data to a single resource (e.g. log file)
Due to business reasons, flushing the data to another HBase table is not possible.
I am not very sure whether this solution helps you..
we can use context.write(key,value).. which will write the mapper results into output path. If you want the results to be written into some specific format, you can write you own customized calss for output format and you can use it in you context.write.
If you want a detailed answer, please describe the sample input and output and share your Mapper class.