This week's book giveaway is in the OO, Patterns, UML and Refactoring forum. We're giving away four copies of Refactoring for Software Design Smells: Managing Technical Debt and have Girish Suryanarayana, Ganesh Samarthyam & Tushar Sharma on-line! See this thread for details.
Hadoop Mapper reads each line as value. I processed a file having username,comments e.t.c Each usernma,comments e.tc wre in a separate line. I processed it successfully to extract the comments and do required manipulation.
Now, I have to process a file in which each line is not in a separate line.i.e line breaks are not regular.Can you advice me how to process this file as Hadoop Mapper read each line as a value now if there is no proper end of line how to process it.
MapReduce uses a record reader behind the scenes which by default reads one line at a time. You can override this behviour using a customre record reader and take control of what constitutes a record. Look into org.apache.hadoop.mapred.RecordReader interface. There are several implementations of this interface available out of the box.