posted 3 years ago
While creating EMR cluster, we can select the services which we want to use such as Spark, Hadoop etc. Hadoop Map reduce is not used much nowadays or new projects and Spark is preferred. HDFS would not be required as EMR would use S3/EMRFS. So why would one choose Hadoop while creating an EMR cluster if we are already choosing Spark. Is it mainly because of Hive which is a component of Hadoop ecosystem used for adhoc analysis.
Thanks.