I am new to HADOOP as well as JAVA.
While I was looking into the "WordCount" problem, I was confused with the use of generics in the WordCountMapper Class.
The Class looks like:
I know that the Generics is used here to assign KEY VALUE pair for mapper input and output. But I want to know what is the advantage of using GENERICS for assigning KEY VALUE pairs.
Is this the only way to assign KEY VALUE pair for the mapper class. Please explain in details.
First and foremost reason is the signature of the Mapper class in Apache Hadoop appears with generics which 'technically' forces us to use these generics (either specify types in place of generic placeholders or continue using the generics as it is in the extended class as well):-
Secondly, the feature of generic for the Mapper class has been provided in order for us to have freedom in choosing desired types (classes) for key and value objects in key,value pair for the Mapper. Apache Hadoop could very well force us and lock us in using some predefined type for key, value (e.g. Text and IntWritable as output key and output value types). But in that case programmer won't be able to output anything from Mapper other than <Text,IntWritable>. What if you want to use <FloatWritable,Text> instead? Or to make it more complex what if you want to define your own classes for them as <JoneKeyOutputType,JoneValueOutputType> ? So, this is the reason why Mapper is using generics. This gives us immense freedom in choosing these types for mappers, reducers etc. The whole point of using generics, here, and anywhere else, is to separate logic from data type. Here one need not have separate definition of Mapper only because user decides to go with some other data combination for <key,value> other than what was defined in the Hadoop API. The Mapper logic remains the same, yet allows programmer to specify the types of his own choice.