| Author |
mapreduce giving a wrong count
|
praveenKumar Bandi
Greenhorn
Joined: Feb 11, 2013
Posts: 1
|
|
Hi All,
I am new to Mapreduce and I am trying to explore it a little. I took the basic WordCount example and have run it over data that is in mySQL table, it is giving 34 count for each individual record of mySQL. I assume that the map function is being called 34 times for each of the record in the table. I wonder is there a way to control the number of times the map function can be called. Please let me know if there is something I am missing.Any help on this is appreciated.
Here is the code that I am using:
public class WordCount1 {
public static Connection Con;
public static Statement statement = null;
public static PreparedStatement preparedStatement = null;
public static ResultSet resultSet = null;
public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
public static Text word = new Text();
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
try
{
Class.forName("com.mysql.jdbc.Driver");
Con= DriverManager.getConnection("jdbc:mysql://<<ip_add>>:3306/test","user","mysql");
statement = Con.createStatement();
resultSet = statement.executeQuery("select * from test.a1");
while(resultSet.next())
{
word.set(resultSet.getString("aname"));
output.collect(word,one);
}
}
catch (ClassNotFoundException e)
{
System.out.println("no MYSQL Driver found");
word.set(e.toString());
output.collect(word,one);
}
catch (SQLException e)
{
word.set(e.toString());
output.collect(word,one);
}
}
}
public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(WordCount1.class);
conf.setJobName("wordcount");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}
}
output:
name1 34
name2 34
Thanks in Advance,
Praveen K Bandi
|
 |
Amruth Puppala
Ranch Hand
Joined: Jul 14, 2008
Posts: 295
|
|
Hi Praveen,
I'm also very new to Hadoop but let me try answering you.
Hadoop should be used for unstructured data formats or semistructured formats for utilizing its features.
But I don't know why you are trying to get the data from database, instead you can try to read from file or any source.
Your assumption is correct in uderstanding your programe , it it calling 34 times.
As per the hadoop framework Map function will be for each data record.
But inside the Map you are trying to get the data from database always , infact map function should get the data from framework only.
Usually it will get from Text value parameter. by using value we usually perform our operation.
So I guess you might configured correctly, don't use connecting DB, getting results from resultset.
each records from the result set might get in value , try to use that value to your output.
I hope you understood.
Regards
Amruth
|
 |
 |
|
|
subject: mapreduce giving a wrong count
|
|
|