Rack awareness

Hadoop divides the data into multiple file blocks and stores them on different machines. If Rack Awareness is not configured, there may be a possibility that hadoop will place all the copies of the block in same rack which results in loss of data when that rack fails

Below are steps to configure rack awareness policy – ( manually  )

** stop the cluster

** Copy those 2 files rack_topology.sh  ( rack topology script ) and topology.data to the /etc/gphd/hadoop/conf directory on all cluster NameNodes (phdmst01 and phdmst02 )

** Add the following property to core-site.xml:

<property>

<name>net.topology.script.file.name</name>

<value>/etc/gphd/hadoop/conf/rack_topology.sh</value>

</property>

 

[root@phdmst01 conf]# pwd

/etc/gphd/hadoop/conf

Rack topology script

[root@phdmst01 conf]# more rack_topology.sh

HADOOP_CONF=/etc/gphd/hadoop/conf

while [ $# -gt 0 ] ; do

nodeArg=$1

exec< ${HADOOP_CONF}/topology.data

result=””

while read line ; do

ar=( $line )

if [ “${ar[0]}” = “$nodeArg” ] ; then

result=”${ar[1]}”

fi

done

shift

if [ -z “$result” ] ; then

echo -n “/default/rack ”

else

echo -n “$result ”

fi

done

[root@phdmst01 conf]# more topology.data

192.168.129.56           /bcc/rack1

192.268.129.57           /bcc/rack1

192.168.129.58           /bcc/rack1

192.168.129.59           /bcc/rack2

192.168.129.60           /bcc/rack2

192.168.129.61           /bcc/rack2

Verify Rack Awareness

The hadoop dfsamin -printTopology command will show the topology

-bash-4.1$ hdfs dfsadmin -printTopology

Rack: /bcc/rack1

192.168.129.56:50010 (phddna01.mydev.com)

192.168.129.57:50010 (phddna02.mydev.com)

192.168.129.58:50010 (phddna03.mydev.com)

Rack: /bcc/rack2

192.168.129.59:50010 (phddnb01.mydev.com)

192.168.129.60:50010 (phddnb02.mydev.com)

192.168.129.61:50010 (phddnb03.mydev.com)

Also you can test with following commands:

– Hadoop fsck command

– dfsadmin -report

2) Configure rack awarness with ambari

Setting up the HDFS NFS Gateway

The NFS Gateway supports NFSv3 and allows HDFS to be mounted as part of the client’s local file system

Set up NFS Gateway to access HDFS data

Install hdfs nfs packages

 

#yum install hadoop-hdfs-nfs3.x86_64

#yum install hadoop-hdfs-portmap

To start the portmap and NFS gateway daemon:

Run either:

$ sudo service hadoop-hdfs-portmap start

$ sudo service hadoop-hdfs-nfs3 start

 

or

 

$ sudo /etc/init.d/hadoop-hdfs-portmap start

$ sudo /etc/init.d/hadoop-hdfs-nfs3 start

 

Verify validity of NFS related services

[root@phdmst04 ~]# rpcinfo -p phdmst04

program vers proto   port service

100005   1   tcp   4242 mountd

100000   2   udp   111 portmapper

100005    3   tcp   4242 mountd

100005   2   udp   4242 mountd

100003   3   tcp   2049 nfs

100000   2   tcp   111 portmapper

100005   3   udp   4242 mountd

100005   1   udp   4242 mountd

100005   2   tcp   4242 mountd

 

[root@phdmst04 ~]# showmount -e phdmst04

Export list for phdmst04:

/ 192.168.129.55/255.255.255.0

#mount -t nfs -o vers=3,proto=tcp,nolock,noatime phdmst04:/ /data/hdfs_mnt

[root@phdmst04 ~]# df -h

Filesystem           Size Used Avail Use% Mounted on

/dev/mapper/vg_cmri-lv_root

202G   11G 182G   6% /

tmpfs                  95G     0   95G   0% /dev/shm

/dev/sda1             485M   66M 394M 15% /boot

/dev/mapper/vg_cmri-lv_home

9.9G 2.1G 7.3G 23% /home

/dev/mapper/vg_data-lv_data

493G 243M 467G   1% /data

phdmst04:/     323T 3.3T 320T   2% /data/hdfs_mnt

Troubleshooting

check nfs3 and portmap status

[root@blpphdmst04 init.d]# ./hadoop-hdfs-portmap status
portmap is stopped
[root@blpphdmst04 init.d]# ./hadoop-hdfs-portmap start
starting portmap, logging to /var/log/gphd/hadoop-hdfs/hadoop-hdfs-portmap-blpphdmst04.mydev.com.out
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

[  OK  ]

[root@blpphdmst04 init.d]# ./hadoop-hdfs-nfs3 status
nfs3 is stopped

[root@blpphdmst04 init.d]# ./hadoop-hdfs-nfs3 start
starting nfs3, logging to /var/log/gphd/hadoop-hdfs/hadoop-hdfs-nfs3-blpphdmst04.mydev.com.out
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

[  OK  ]
[root@blpphdmst04 init.d]# mount -t nfs -o vers=3,proto=tcp,nolock,noatime blpphdmst04:/ /data/hdfs_mnt
[root@blpphdmst04 init.d]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_cmri-lv_root
202G   16G  176G   9% /
tmpfs                  95G     0   95G   0% /dev/shm
/dev/sda1             477M   89M  363M  20% /boot
/dev/mapper/vg_cmri-lv_home
9.8G  2.0G  7.3G  22% /home
/dev/mapper/vg_data-lv_data
493G  117M  467G   1% /data
blpphdmst04:/      269T  2.9T  266T   2% /data/hdfs_mnt

 

Hadoop Commands

Create a directory in HDFS

$ hdfs dfs -mkdir /user/mike

-bash-4.1$ hadoop fs -mkdir hdfs://sphdmst01.dev.com/user/ovi/test
-bash-4.1$ hadoop fs -ls hdfs://sphdmst01.dev.com/user/ovi/
Found 3 items
drwxr-xr-x   – gpadmin hadoop          0 2015-07-23 16:20 hdfs://sphdmst01.dev.com/user/ovi/ovi.har
drwxr-xr-x   – gpadmin hadoop          0 2015-07-24 11:59 hdfs://sphdmst01.dev.com/user/ovi/ovi2.har
drwxr-xr-x   – gpadmin hadoop          0 2015-09-18 16:43 hdfs://sphdmst01.dev.com/user/ovi/test

Copies files from the local file system to the destination file system

$ hadoop fs -put test.txt /user/mike/

Download

$hadoop fs -get /user/mike/test/txt /home

 

List the contents of a directory

$ hdfs dfs -ls /user/mike
Found 1 items
-rw-r–r–   3 gpadmin hadoop         15 2015-06-04 11:04 /user/mike/test.txt

 

$ hdfs dfs -cat /user/mike/test.txt
just a test

 

$ hdfs dfs -rm /user/mike/test.txt
15/06/04 11:40:00 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 86400000 minutes, Emptier interval = 0 minutes.
Moved: ‘hdfs://dev/user/mike/test.txt’ to trash at: hdfs://dev/user/gpadmin/.Trash/Current

 

Takes a source directory files as input and concatenates files in src into the destination local file

$ hadoop fs -put test1.txt /user/mike
$ hadoop fs -put test2.txt /user/mike

$ hadoop fs -ls /user/mike
Found 2 items
-rw-r–r–   3 gpadmin hadoop         26 2015-06-09 11:10 /user/mike/test1.txt
-rw-r–r–   3 gpadmin hadoop         28 2015-06-09 11:10 /user/mike/test2.txt

$ hadoop fs -getmerge /user/mike /tmp/output.txt

$ more output.txt
just a test
just a test
just a test2
just a test2

Check file system

$ hadoop fsck /

…………………………………………………………………

Total size:    1252660561619 B
Total dirs:    784
Total files:   43391
Total symlinks:                0 (Files currently being written: 6)
Total blocks (validated):      23155 (avg. block size 54098922 B) (Total open file blocks (not validated): 1)
Minimally replicated blocks:   23155 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       0 (0.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     3.0
Corrupt blocks:                0
Missing replicas:              0 (0.0 %)
Number of data-nodes:          4
Number of racks:               1
FSCK ended at Thu Jun 04 10:54:29 EDT 2015 in 1544 milliseconds

The filesystem under path ‘/’ is HEALTHY

To view a list of all the blocks, and the locations of the blocks, the command would be

$hadoop fsck / -files -blocks -locations

 

 

$ hdfs dfs
Usage: hadoop fs [generic options]
[-appendToFile <localsrc> … <dst>]
[-cat [-ignoreCrc] <src> …]
[-checksum <src> …]
[-chgrp [-R] GROUP PATH…]
[-chmod [-R] <MODE[,MODE]… | OCTALMODE> PATH…]
[-chown [-R] [OWNER][:[GROUP]] PATH…]
[-copyFromLocal [-f] [-p] <localsrc> … <dst>]
[-copyToLocal [-p] [-ignoreCrc] [-crc] <src> … <localdst>]
[-count [-q] <path> …]
[-cp [-f] [-p] <src> … <dst>]
[-createSnapshot <snapshotDir> [<snapshotName>]]
[-deleteSnapshot <snapshotDir> <snapshotName>]
[-df [-h] [<path> …]]
[-du [-s] [-h] <path> …]
[-expunge]
[-get [-p] [-ignoreCrc] [-crc] <src> … <localdst>]
[-getmerge [-nl] <src> <localdst>]
[-help [cmd …]]
[-ls [-d] [-h] [-R] [<path> …]]
[-mkdir [-p] <path> …]
[-moveFromLocal <localsrc> … <dst>]
[-moveToLocal <src> <localdst>]
[-mv <src> … <dst>]
[-put [-f] [-p] <localsrc> … <dst>]
[-renameSnapshot <snapshotDir> <oldName> <newName>]
[-rm [-f] [-r|-R] [-skipTrash] <src> …]
[-rmdir [–ignore-fail-on-non-empty] <dir> …]
[-setrep [-R] [-w] <rep> <path> …]
[-stat [format] <path> …]
[-tail [-f] <file>]
[-test -[defsz] <path>]
[-text [-ignoreCrc] <src> …]
[-touchz <path> …]
[-usage [cmd …]]

Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
-jt <local|jobtracker:port>    specify a job tracker
-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

 

Hadoop dfsadmin Command Options

$ hdfs dfsadmin
Usage: java DFSAdmin
Note: Administrative commands can only be run as the HDFS superuser.
[-report]
[-safemode enter | leave | get | wait]
[-allowSnapshot ]
[-disallowSnapshot ]
[-saveNamespace]
[-rollEdits]
[-restoreFailedStorage true|false|check]
[-refreshNodes]
[-finalizeUpgrade]
[-metasave filename]
[-refreshServiceAcl]
[-refreshUserToGroupsMappings]
[-refreshSuperUserGroupsConfiguration]
[-printTopology]
[-refreshNamenodes datanodehost:port]
[-deleteBlockPool datanode-host:port blockpoolId [force]]
[-setQuota …]
[-clrQuota …]
[-setSpaceQuota …]
[-clrSpaceQuota …]
[-setBalancerBandwidth ]
[-fetchImage ]
[-help [cmd]]

Generic options supported are
-conf      specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
-jt <local|jobtracker:port>    specify a job tracker
-files     specify comma separated files to be copied to the map reduce cluster
-libjars     specify comma separated jar files to include in the classpath.
-archives     specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

 

$ hdfs dfsadmin -safemode get
Safe mode is OFF

 

Hadoop haadmin  Command Options

$ hdfs haadmin
Usage: DFSHAAdmin [-ns ]
[-transitionToActive ]
[-transitionToStandby ]
[-failover [–forcefence] [–forceactive] ]
[-getServiceState ]
[-checkHealth ]
[-help ]

Generic options supported are
-conf      specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
-jt <local|jobtracker:port>    specify a job tracker
-files     specify comma separated files to be copied to the map reduce cluster
-libjars     specify comma separated jar files to include in the classpath.
-archives     specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

Example:

$ hdfs haadmin -getServiceState nn1
active

$ hdfs haadmin -getServiceState nn2
standby

$ hdfs haadmin -checkHealth nn1
$ hdfs haadmin -checkHealth nn2

 

[gpadmin@phdmst01 ~]$ hdfs getconf
hdfs getconf is utility for getting configuration information from the config file.

hadoop getconf
[-namenodes]                    gets list of namenodes in the cluster.
[-secondaryNameNodes]                   gets list of secondary namenodes in the cluster.
[-backupNodes]                  gets list of backup nodes in the cluster.
[-includeFile]                  gets the include file path that defines the datanodes that can join the cluster.
[-excludeFile]                  gets the exclude file path that defines the datanodes that need to decommissioned.
[-nnRpcAddresses]                       gets the namenode rpc addresses
[-confKey [key]]                        gets a specific key from the configuration

Example:

 

[gpadmin@phdmst01 ~]$ hdfs getconf -namenodes
phdmst01.mydev.com phdmst02.mydev.com

 

[gpadmin@phdmst01 ~]$ hdfs getconf -nnRpcAddresses
phdmst01.mydevcom:8020
phdmst02.mydev.com:8020

 

Yarn

$ yarn node -list
15/06/05 14:26:11 INFO client.RMProxy: Connecting to ResourceManager at phdmst03.mydev.com/192.168.68.131:8032
Total Nodes:2
Node-Id             Node-State Node-Http-Address       Number-of-Running-Containers
phddnb01.mydev.com:16638              RUNNING        phddnb01.mydev..com:8042                                  0
phddna01.mydev.com:58002              RUNNING        phddna01.mydev.com:8042                                  0

 

$ yarn node -status phddnb01.mydev.com:16638
15/06/05 14:31:03 INFO client.RMProxy: Connecting to ResourceManager at phdmst03.mydev.com/10.193.68.131:8032
Node Report :
Node-Id : phddnb01.mydev.com:16638
Rack : /default-rack
Node-State : RUNNING
Node-Http-Address : phddnb01.mydev.com:8042
Last-Health-Update : Fri 05/Jun/15 02:29:06:575EDT
Health-Report :
Containers : 0
Memory-Used : 0MB
Memory-Capacity : 8192MB
CPU-Used : 0 vcores
CPU-Capacity : 8 vcores

$ yarn
Usage: yarn [–config confdir] COMMAND
where COMMAND is one of:
resourcemanager      run the ResourceManager
nodemanager          run a nodemanager on each slave
rmadmin              admin tools
version              print the version
jar <jar>            run a jar file
application          prints application(s) report/kill application
node                 prints node report(s)
logs                 dump container logs
classpath            prints the class path needed to get the
Hadoop jar and the required libraries
daemonlog            get/set the log level for each daemon
or
CLASSNAME            run the class named CLASSNAME
Most commands print help when invoked w/o parameters.

yarn version
Hadoop 2.2.0-gphd-3.0.1.0
Source code repository: ssh://git@stash.greenplum.com:2222/phd/hadoop.git -r 3055df0b53cf992665913380a1651345c477a0d2
Compiled by pivotal on 2014-04-14T03:38Z
Compiled with protoc 2.5.0
From source with checksum 93b8d74f534acdc126e8575bba69fc70
This command was run using /usr/lib/gphd/hadoop/hadoop-common-2.2.0-gphd-3.0.1.0.jar

 

$ yarn rmadmin
Usage: java RMAdmin
[-refreshQueues]
[-refreshNodes]
[-refreshUserToGroupsMappings]
[-refreshSuperUserGroupsConfiguration]
[-refreshAdminAcls]
[-refreshServiceAcl]
[-getGroups [username]]
[-updateNodeResource [NodeID][MemSize][Cores]]
[-help [cmd]]

Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
-jt <local|jobtracker:port>    specify a job tracker
-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

 

$ yarn rmadmin -getGroups gpadmin

15/06/18 11:37:27 INFO client.RMProxy: Connecting to ResourceManager at phdmst03.mydev.com/192.168.68.135:8033
gpadmin : gpadmin hadoop