It's not a secret anymore!
The moose likes Hadoop and the fly likes how to set map-reduce task? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Databases » Hadoop
Bookmark "how to set map-reduce task?" Watch "how to set map-reduce task?" New topic

how to set map-reduce task?

Joseph Hwang

Joined: Aug 17, 2013
Posts: 17
Following data are extracted from the 1st map-reduce task

country ; title ; sex ; units ; file location
Turkey ; Population ; Males ; Persons ; L/F/W/A/5/LFWA55MATRQ647N.csv
Turkey ; Population ; Males ; Persons ; L/F/W/A/5/LFWA55MATRA647N.csv
Turkey ; Population ; Males ; Persons ; L/F/W/A/5/LFWA55MATRQ647S.csv
Turkey ; Population ; Males ; Persons ; L/F/W/A/5/LFWA55MATRA647S.csv

And then i try to set 2nd map-reduce task with csv files of the file location column. Data format of each csv files is like below

year ; population
2004 ; 2130034
2005 ; 2239913
2006 ; 2437712
2007 ; 2210673

But i have no idea how to set 2nd map-reduce task with using file location column data from 1st map-reduce task. The final output format is like below

country ; year ; population
Turkey ; 2004 ; 2130034
Turkey ; 2005 ; 2239913
Turkey ; 2006 ; 2437712
Turkey ; 2007 ; 2210673

As far as i know, input file path is set only in driver class with FileInputFormat.setInputPaths() method, but in my map-reduce task file location is handled only in map and reduce class.i wonder how to load input file path from map and reduce class into driver class?
How can i put file location value into FileInputFormat.setInputPaths() method, for example FileInputFormat.setInputPaths(job,new Path("L/F/W/A/5/LFWA55MATRQ647N.csv"));
I need your advice. Your help will be appreciated in advance!
Rajesh Nagaraju
Ranch Hand

Joined: Nov 27, 2003
Posts: 63
One way to do chaining of MR jobs is to use Spring Batch
amit punekar
Ranch Hand

Joined: May 14, 2004
Posts: 544
Are your CSV files on HDFS ? How big is one file? I mean how many rows of "year";"population" does it contain ? You could copy them to HDFS first.

Then run a Pig script which would automatically chain the required MR jobs to process the data.

Pig script would roughly look like (Assuming output of your 1st MR is in a file)
1) Read the 1st MR output with schema - country,title,sex,units, file location (or name)
2) If CSV files are on HDFS, read those file using schema - file location (or name), year, population [You may have to write your own Loader Function for this as we want to have File location as one of the output fields]
3) Join 1 and 2 using "file location (name)" which would result in desired output i.e.
country, year, population

Of course, this all can be done using plain MR as well but you will have to chain those jobs together. Whichever way you proceed, I believe you would need to have CSV files on the HDFS cluster.


I agree. Here's the link:
subject: how to set map-reduce task?
jQuery in Action, 3rd edition