posted 4 years ago
I have seen the below two approaches regarding hadoop cluster on cloud:
Approach 1: Create Virtual Machines on cloud depending on the number of nodes you want. On those nodes ,install hadoop and create a hadoop cluster. Keep these Virtual machines up and running and thus keep paying the cost continously.
Approach 2: Create a Unix script having all the commands from creating Virtual machines to creating the hadoop cluster. Run this script to create the virtual machines and then the hadoop cluster.Use cluster to do your processing. After your work is done, shut down the virtual machines and delete them. Next,time you have to do work, run the script which will create the virtual machines and cluster ,and then do your processing.And so on.
This approach is cheaper because,cluster will be up and running only when required .
Which of the above approaches is more commonly accepted in the industry ?thanks .