Win a copy of Design for the Mind this week in the Design forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Hive Gzip Compression splitting supported now?

 
Darrel Riekhof
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Does Hadoop automatically support splitting Gzip files into Blocks now? I have read that splitting doesn't work for tables using gzip compression in Hadoop/Hive here:

https://cwiki.apache.org/Hive/compressedstorage.html

From the above link: "in this case Hadoop will not be able to split your file into chunks/blocks and run multiple maps in parallel. This can cause under-utilization of your cluster's 'mapping' power."

However, when I load my table exactly as they describe, I notice that the gz I load is definately split up into blocks in the place it stores my HDFS files. It looks like this after doing the load:



It is clearly chopping it up into 64 mb blocks during the load to HDFS.

Is this something they have added recently? I'm using Hadoop 1.0.4, r1393290 in psuedo cluster mode.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic