• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Hadoop and compression

 
Andrew Mcmurray
Ranch Hand
Posts: 188
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi all

I am pretty new to the HDFS and was looking for some opinions on some conflicting answers I have recently gotten.

1. Is it a good idea to compress the stream to write the file out to hadoop. One person told me they had got 10x benefit from doing this. Another told me that it was bad to compress b\c the map reduces that ran on the file could not be distributed using compressed files.

2. I read that map reduces running on hadoop works best with file sizes between 500gb and tb size files. Someone told me that the it works better with smaller files.

Any thoughts?

Thanks,

AMD
 
Srinivasa Madugula
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Mcmurray,

As per Definitive guide, "All compression algorithms exhibit a space/time trade-off: faster compression and decompression speeds usually come at the expense of smaller space savings."
However HDFS offers various compression techniques. You can select the compression techniques depends on your need, ie Either you need better performance or better space optimization or wants to balance both.

Hadoop works well with large files. If you are using Hadoop for the storage & processing of small files,
i) Load on Name node will be more, as with more no. of small files, more amount of meta data needs to be saved & operated at Name Node.
ii) The complete utilization of Blocks may not be happen.
 
Rajesh Nagaraju
Ranch Hand
Posts: 63
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The other aspect to note for deciding the compression is whether the compression technique is splittable or not
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic