When I run a query in hive say "select * from tablename"---No map reduce runs.but when i run query "select * from tablename where -----" -It starts to run map reduce in the background. Why so does is run map reduce only in case of where clause? also the response comes faster in normal query than when with where clause for same reason...so whats the reason.
Case 1: SELECT * FROM <table>;
In this case, all the table contents are supposed to be delivered straight forward. There there isn't any 'precondition' or 'filter' as such which 'WHERE' clause introduces.
Hive stores tables as files on HDFS and AFAIK in this case Hive simply out streams that file contents (similar to 'cat' in Linux).
This must be part of optimization. Running MR job and slowing the query doesn't make sense in this case.
Case 2: SELECT * FROM <table> WHERE <condition;>
In this case, table contents must be processed through some kind of logic/filter to get rows matching the condition.
As Hive is meant for huge data, this processing is done by taking advantage of scalable, parallel Hadoop map reduce framework.