I wrote this piece of hadoop to preprocess files and write the files again to the output directory. I see files by name part-000, 0001 and so on being created but they all are empty. I use NullWritable for key. But set Text for value. I am not sure if its because of that.
The following is my code:
posted 4 years ago
TextInputFormat already splits your input by line and only hands your map function one line at a time. So you don't need the while loop. Also, String.contains() takes a CharSequence, not a regular expression. Unless you are looking for the literal character sequence "[A-Za-z]" you want to use String.matches().