aspose file tools*
The moose likes Hadoop and the fly likes Hive Gzip Compression splitting supported now? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Databases » Hadoop
Bookmark "Hive Gzip Compression splitting supported now?" Watch "Hive Gzip Compression splitting supported now?" New topic
Author

Hive Gzip Compression splitting supported now?

Darrel Riekhof
Greenhorn

Joined: May 21, 2009
Posts: 1
Does Hadoop automatically support splitting Gzip files into Blocks now? I have read that splitting doesn't work for tables using gzip compression in Hadoop/Hive here:

https://cwiki.apache.org/Hive/compressedstorage.html

From the above link: "in this case Hadoop will not be able to split your file into chunks/blocks and run multiple maps in parallel. This can cause under-utilization of your cluster's 'mapping' power."

However, when I load my table exactly as they describe, I notice that the gz I load is definately split up into blocks in the place it stores my HDFS files. It looks like this after doing the load:



It is clearly chopping it up into 64 mb blocks during the load to HDFS.

Is this something they have added recently? I'm using Hadoop 1.0.4, r1393290 in psuedo cluster mode.
 
Gartner says :Bigdata will be most advanced analytics products by 2015 !

Time to Become Big data architect by learning Hadoop(Developer, Administration,Analyst,QA),Cassandra,MongoDb,HBase,Datascience, Mahout, Splunk,R etc) from scratch to expert level

https://intellipaat.com/course-cat/big-data/?utm_source=coderanch%20&utm_medium=text&utm_campaign=coderanchdx1
 
subject: Hive Gzip Compression splitting supported now?