File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Hadoop and the fly likes Hive Gzip Compression splitting supported now? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Databases » Hadoop
Bookmark "Hive Gzip Compression splitting supported now?" Watch "Hive Gzip Compression splitting supported now?" New topic
Author

Hive Gzip Compression splitting supported now?

Darrel Riekhof
Greenhorn

Joined: May 21, 2009
Posts: 1
Does Hadoop automatically support splitting Gzip files into Blocks now? I have read that splitting doesn't work for tables using gzip compression in Hadoop/Hive here:

https://cwiki.apache.org/Hive/compressedstorage.html

From the above link: "in this case Hadoop will not be able to split your file into chunks/blocks and run multiple maps in parallel. This can cause under-utilization of your cluster's 'mapping' power."

However, when I load my table exactly as they describe, I notice that the gz I load is definately split up into blocks in the place it stores my HDFS files. It looks like this after doing the load:



It is clearly chopping it up into 64 mb blocks during the load to HDFS.

Is this something they have added recently? I'm using Hadoop 1.0.4, r1393290 in psuedo cluster mode.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Hive Gzip Compression splitting supported now?
 
Similar Threads
Issues with Tortoise CVS and Cruise control
PGPPublicKeyRingCollection doubts
How to Enable SSL on Tomcat 7 on Linux?
java.io.FileNotFoundException: Too many open files
text file extractor