Rack awareness

Hadoop divides the data into multiple file blocks and stores them on different machines. If Rack Awareness is not configured, there may be a possibility that hadoop will place all the copies of the block in same rack which results in loss of data when that rack fails

Below are steps to configure rack awareness policy – ( manually  )

** stop the cluster

** Copy those 2 files rack_topology.sh  ( rack topology script ) and topology.data to the /etc/gphd/hadoop/conf directory on all cluster NameNodes (phdmst01 and phdmst02 )

** Add the following property to core-site.xml:

<property>

<name>net.topology.script.file.name</name>

<value>/etc/gphd/hadoop/conf/rack_topology.sh</value>

</property>

 

[root@phdmst01 conf]# pwd

/etc/gphd/hadoop/conf

Rack topology script

[root@phdmst01 conf]# more rack_topology.sh

HADOOP_CONF=/etc/gphd/hadoop/conf

while [ $# -gt 0 ] ; do

nodeArg=$1

exec< ${HADOOP_CONF}/topology.data

result=””

while read line ; do

ar=( $line )

if [ “${ar[0]}” = “$nodeArg” ] ; then

result=”${ar[1]}”

fi

done

shift

if [ -z “$result” ] ; then

echo -n “/default/rack ”

else

echo -n “$result ”

fi

done

[root@phdmst01 conf]# more topology.data

192.168.129.56           /bcc/rack1

192.268.129.57           /bcc/rack1

192.168.129.58           /bcc/rack1

192.168.129.59           /bcc/rack2

192.168.129.60           /bcc/rack2

192.168.129.61           /bcc/rack2

Verify Rack Awareness

The hadoop dfsamin -printTopology command will show the topology

-bash-4.1$ hdfs dfsadmin -printTopology

Rack: /bcc/rack1

192.168.129.56:50010 (phddna01.mydev.com)

192.168.129.57:50010 (phddna02.mydev.com)

192.168.129.58:50010 (phddna03.mydev.com)

Rack: /bcc/rack2

192.168.129.59:50010 (phddnb01.mydev.com)

192.168.129.60:50010 (phddnb02.mydev.com)

192.168.129.61:50010 (phddnb03.mydev.com)

Also you can test with following commands:

– Hadoop fsck command

– dfsadmin -report

2) Configure rack awarness with ambari

Leave a comment