Win a copy of Re-engineering Legacy Software this week in the Refactoring forum
or Docker in Action in the Cloud/Virtualization forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

why Hive runs map reduce jobs only for Where clause statements not for normal select statements?

 
Monica Shiralkar
Ranch Hand
Posts: 826
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
When I run a query in hive say "select * from tablename"---No map reduce runs.but when i run query "select * from tablename where -----" -It starts to run map reduce in the background. Why so does is run map reduce only in case of where clause? also the response comes faster in normal query than when with where clause for same reason...so whats the reason.
thanks
 
Tushar Sudake
Greenhorn
Posts: 2
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Case 1: SELECT * FROM <table>;
In this case, all the table contents are supposed to be delivered straight forward. There there isn't any 'precondition' or 'filter' as such which 'WHERE' clause introduces.
Hive stores tables as files on HDFS and AFAIK in this case Hive simply out streams that file contents (similar to 'cat' in Linux).
This must be part of optimization. Running MR job and slowing the query doesn't make sense in this case.

Case 2: SELECT * FROM <table> WHERE <condition;>
In this case, table contents must be processed through some kind of logic/filter to get rows matching the condition.
As Hive is meant for huge data, this processing is done by taking advantage of scalable, parallel Hadoop map reduce framework.

Hope this solves your query.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic