| Author |
Hive Gzip Compression splitting supported now?
|
Darrel Riekhof
Greenhorn
Joined: May 21, 2009
Posts: 1
|
|
Does Hadoop automatically support splitting Gzip files into Blocks now? I have read that splitting doesn't work for tables using gzip compression in Hadoop/Hive here:
https://cwiki.apache.org/Hive/compressedstorage.html
From the above link: "in this case Hadoop will not be able to split your file into chunks/blocks and run multiple maps in parallel. This can cause under-utilization of your cluster's 'mapping' power."
However, when I load my table exactly as they describe, I notice that the gz I load is definately split up into blocks in the place it stores my HDFS files. It looks like this after doing the load:
It is clearly chopping it up into 64 mb blocks during the load to HDFS.
Is this something they have added recently? I'm using Hadoop 1.0.4, r1393290 in psuedo cluster mode.
|
 |
 |
|
|
subject: Hive Gzip Compression splitting supported now?
|
|
|