This week's book giveaway is in the OO, Patterns, UML and Refactoring forum. We're giving away four copies of Refactoring for Software Design Smells: Managing Technical Debt and have Girish Suryanarayana, Ganesh Samarthyam & Tushar Sharma on-line! See this thread for details.
From Hadoop definitive guide book:
The decommissioning process is controlled by an exclude file, which for HDFS is set by the dfs.hosts.exclude property and for MapReduce by the mapred.hosts.exclude property. It is often the case that these properties refer to the same file. The exclude file lists the nodes that are not permitted to connect to the cluster.
To remove nodes from the cluster:
1. Add the network addresses of the nodes to be decommissioned to the exclude file. Do not update the include file at this point.
2. Update the namenode with the new set of permitted datanodes, with this command:
% hadoop dfsadmin -refreshNodes
3. Update the jobtracker with the new set of permitted tasktrackers using:
% hadoop mradmin -refreshNodes
SCJP : 90%
Joined: Mar 20, 2013
Thanks for the reply
Also how much does the decommission of a datanode took.
Because when i try to do the above steps it was taking a lot of time and i saw the same status Decommission in progress, It never changed to Decommissioned
Also what happen to the data store on the datanode (which we will be decommission) ?
The data that exist in this data node will be replicated by HDFS to available data nodes. I have not tried practically, but nodes should be decommissioned when the command "% hadoop dfsadmin -refreshNodes" is issued.