Process a file using Hadoop Map Reduce without proper End of Line.
posted 3 years ago
Hadoop Mapper reads each line as value. I processed a file having username,comments e.t.c Each usernma,comments e.tc wre in a separate line. I processed it successfully to extract the comments and do required manipulation.
Now, I have to process a file in which each line is not in a separate line.i.e line breaks are not regular.Can you advice me how to process this file as Hadoop Mapper read each line as a value now if there is no proper end of line how to process it.
MapReduce uses a record reader behind the scenes which by default reads one line at a time. You can override this behviour using a customre record reader and take control of what constitutes a record. Look into org.apache.hadoop.mapred.RecordReader interface. There are several implementations of this interface available out of the box.