Two Laptop Bag
The moose likes Hadoop and the fly likes Hive Gzip Compression splitting supported now? Big Moose Saloon
  Search | Java FAQ | Recent Topics
Register / Login
JavaRanch » Java Forums » Databases » Hadoop
Reply Bookmark "Hive Gzip Compression splitting supported now?" Watch "Hive Gzip Compression splitting supported now?" New topic
Author

Hive Gzip Compression splitting supported now?

Darrel Riekhof
Greenhorn

Joined: May 21, 2009
Posts: 1
Does Hadoop automatically support splitting Gzip files into Blocks now? I have read that splitting doesn't work for tables using gzip compression in Hadoop/Hive here:

https://cwiki.apache.org/Hive/compressedstorage.html

From the above link: "in this case Hadoop will not be able to split your file into chunks/blocks and run multiple maps in parallel. This can cause under-utilization of your cluster's 'mapping' power."

However, when I load my table exactly as they describe, I notice that the gz I load is definately split up into blocks in the place it stores my HDFS files. It looks like this after doing the load:



It is clearly chopping it up into 64 mb blocks during the load to HDFS.

Is this something they have added recently? I'm using Hadoop 1.0.4, r1393290 in psuedo cluster mode.
 
I agree. Here's the link: http://ej-technologies/jprofiler - if it wasn't for jprofiler, we would need to run our stuff on 16 servers instead of 3.
 
subject: Hive Gzip Compression splitting supported now?
 
Similar Threads
java.io.FileNotFoundException: Too many open files
PGPPublicKeyRingCollection doubts
text file extractor
How to Enable SSL on Tomcat 7 on Linux?
Issues with Tortoise CVS and Cruise control