File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Hadoop and the fly likes Hadoop and compression Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Databases » Hadoop
Bookmark "Hadoop and compression" Watch "Hadoop and compression" New topic
Author

Hadoop and compression

Andrew Mcmurray
Ranch Hand

Joined: Sep 24, 2005
Posts: 188
Hi all

I am pretty new to the HDFS and was looking for some opinions on some conflicting answers I have recently gotten.

1. Is it a good idea to compress the stream to write the file out to hadoop. One person told me they had got 10x benefit from doing this. Another told me that it was bad to compress b\c the map reduces that ran on the file could not be distributed using compressed files.

2. I read that map reduces running on hadoop works best with file sizes between 500gb and tb size files. Someone told me that the it works better with smaller files.

Any thoughts?

Thanks,

AMD
Srinivasa Rao Madugula
Greenhorn

Joined: Jan 06, 2014
Posts: 1
Hi Mcmurray,

As per Definitive guide, "All compression algorithms exhibit a space/time trade-off: faster compression and decompression speeds usually come at the expense of smaller space savings."
However HDFS offers various compression techniques. You can select the compression techniques depends on your need, ie Either you need better performance or better space optimization or wants to balance both.

Hadoop works well with large files. If you are using Hadoop for the storage & processing of small files,
i) Load on Name node will be more, as with more no. of small files, more amount of meta data needs to be saved & operated at Name Node.
ii) The complete utilization of Blocks may not be happen.
Rajesh Nagaraju
Ranch Hand

Joined: Nov 27, 2003
Posts: 62
The other aspect to note for deciding the compression is whether the compression technique is splittable or not
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Hadoop and compression