Hadoop Commands

Create a directory in HDFS

$ hdfs dfs -mkdir /user/mike

-bash-4.1$ hadoop fs -mkdir hdfs://sphdmst01.dev.com/user/ovi/test
-bash-4.1$ hadoop fs -ls hdfs://sphdmst01.dev.com/user/ovi/
Found 3 items
drwxr-xr-x   – gpadmin hadoop          0 2015-07-23 16:20 hdfs://sphdmst01.dev.com/user/ovi/ovi.har
drwxr-xr-x   – gpadmin hadoop          0 2015-07-24 11:59 hdfs://sphdmst01.dev.com/user/ovi/ovi2.har
drwxr-xr-x   – gpadmin hadoop          0 2015-09-18 16:43 hdfs://sphdmst01.dev.com/user/ovi/test

Copies files from the local file system to the destination file system

$ hadoop fs -put test.txt /user/mike/

Download

$hadoop fs -get /user/mike/test/txt /home

List the contents of a directory

$ hdfs dfs -ls /user/mike
Found 1 items
-rw-r–r– 3 gpadmin hadoop 15 2015-06-04 11:04 /user/mike/test.txt

$ hdfs dfs -cat /user/mike/test.txt
just a test

$ hdfs dfs -rm /user/mike/test.txt
15/06/04 11:40:00 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 86400000 minutes, Emptier interval = 0 minutes.
Moved: ‘hdfs://dev/user/mike/test.txt’ to trash at: hdfs://dev/user/gpadmin/.Trash/Current

Takes a source directory files as input and concatenates files in src into the destination local file

$ hadoop fs -put test1.txt /user/mike
$ hadoop fs -put test2.txt /user/mike

$ hadoop fs -ls /user/mike
Found 2 items
-rw-r–r– 3 gpadmin hadoop 26 2015-06-09 11:10 /user/mike/test1.txt
-rw-r–r– 3 gpadmin hadoop 28 2015-06-09 11:10 /user/mike/test2.txt

$ hadoop fs -getmerge /user/mike /tmp/output.txt

$ more output.txt
just a test
just a test
just a test2
just a test2

Check file system

$ hadoop fsck /

…………………………………………………………………

Total size:    1252660561619 B
Total dirs:    784
Total files:   43391
Total symlinks:                0 (Files currently being written: 6)
Total blocks (validated):      23155 (avg. block size 54098922 B) (Total open file blocks (not validated): 1)
Minimally replicated blocks:   23155 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       0 (0.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     3.0
Corrupt blocks:                0
Missing replicas:              0 (0.0 %)
Number of data-nodes:          4
Number of racks:               1
FSCK ended at Thu Jun 04 10:54:29 EDT 2015 in 1544 milliseconds

The filesystem under path ‘/’ is HEALTHY

To view a list of all the blocks, and the locations of the blocks, the command would be

$hadoop fsck / -files -blocks -locations

$ hdfs dfs
Usage: hadoop fs [generic options]
[-appendToFile <localsrc> … <dst>]
[-cat [-ignoreCrc] <src> …]
[-checksum <src> …]
[-chgrp [-R] GROUP PATH…]
[-chmod [-R] <MODE[,MODE]… | OCTALMODE> PATH…]
[-chown [-R] [OWNER][:[GROUP]] PATH…]
[-copyFromLocal [-f] [-p] <localsrc> … <dst>]
[-copyToLocal [-p] [-ignoreCrc] [-crc] <src> … <localdst>]
[-count [-q] <path> …]
[-cp [-f] [-p] <src> … <dst>]
[-createSnapshot <snapshotDir> [<snapshotName>]]
[-deleteSnapshot <snapshotDir> <snapshotName>]
[-df [-h] [<path> …]]
[-du [-s] [-h] <path> …]
[-expunge]
[-get [-p] [-ignoreCrc] [-crc] <src> … <localdst>]
[-getmerge [-nl] <src> <localdst>]
[-help [cmd …]]
[-ls [-d] [-h] [-R] [<path> …]]
[-mkdir [-p] <path> …]
[-moveFromLocal <localsrc> … <dst>]
[-moveToLocal <src> <localdst>]
[-mv <src> … <dst>]
[-put [-f] [-p] <localsrc> … <dst>]
[-renameSnapshot <snapshotDir> <oldName> <newName>]
[-rm [-f] [-r|-R] [-skipTrash] <src> …]
[-rmdir [–ignore-fail-on-non-empty] <dir> …]
[-setrep [-R] [-w] <rep> <path> …]
[-stat [format] <path> …]
[-tail [-f] <file>]
[-test -[defsz] <path>]
[-text [-ignoreCrc] <src> …]
[-touchz <path> …]
[-usage [cmd …]]

Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
-jt <local|jobtracker:port>    specify a job tracker
-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

Hadoop dfsadmin Command Options

$ hdfs dfsadmin
Usage: java DFSAdmin
Note: Administrative commands can only be run as the HDFS superuser.
[-report]
[-safemode enter | leave | get | wait]
[-allowSnapshot ]
[-disallowSnapshot ]
[-saveNamespace]
[-rollEdits]
[-restoreFailedStorage true|false|check]
[-refreshNodes]
[-finalizeUpgrade]
[-metasave filename]
[-refreshServiceAcl]
[-refreshUserToGroupsMappings]
[-refreshSuperUserGroupsConfiguration]
[-printTopology]
[-refreshNamenodes datanodehost:port]
[-deleteBlockPool datanode-host:port blockpoolId [force]]
[-setQuota …]
[-clrQuota …]
[-setSpaceQuota …]
[-clrSpaceQuota …]
[-setBalancerBandwidth ]
[-fetchImage ]
[-help [cmd]]

Generic options supported are
-conf      specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
-jt <local|jobtracker:port>    specify a job tracker
-files     specify comma separated files to be copied to the map reduce cluster
-libjars     specify comma separated jar files to include in the classpath.
-archives     specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

$ hdfs dfsadmin -safemode get
Safe mode is OFF

Hadoop haadmin Command Options

$ hdfs haadmin
Usage: DFSHAAdmin [-ns ]
[-transitionToActive ]
[-transitionToStandby ]
[-failover [–forcefence] [–forceactive] ]
[-getServiceState ]
[-checkHealth ]
[-help ]

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

Example:

$ hdfs haadmin -getServiceState nn1
active

$ hdfs haadmin -getServiceState nn2
standby

$ hdfs haadmin -checkHealth nn1
$ hdfs haadmin -checkHealth nn2

[gpadmin@phdmst01 ~]$ hdfs getconf
hdfs getconf is utility for getting configuration information from the config file.

hadoop getconf
[-namenodes]                    gets list of namenodes in the cluster.
[-secondaryNameNodes]                   gets list of secondary namenodes in the cluster.
[-backupNodes]                  gets list of backup nodes in the cluster.
[-includeFile]                  gets the include file path that defines the datanodes that can join the cluster.
[-excludeFile]                  gets the exclude file path that defines the datanodes that need to decommissioned.
[-nnRpcAddresses]                       gets the namenode rpc addresses
[-confKey [key]]                        gets a specific key from the configuration

Example:

[gpadmin@phdmst01 ~]$ hdfs getconf -namenodes
phdmst01.mydev.com phdmst02.mydev.com

[gpadmin@phdmst01 ~]$ hdfs getconf -nnRpcAddresses
phdmst01.mydevcom:8020
phdmst02.mydev.com:8020

Yarn

$ yarn node -list
15/06/05 14:26:11 INFO client.RMProxy: Connecting to ResourceManager at phdmst03.mydev.com/192.168.68.131:8032
Total Nodes:2
Node-Id             Node-State Node-Http-Address       Number-of-Running-Containers
phddnb01.mydev.com:16638              RUNNING        phddnb01.mydev..com:8042                                  0
phddna01.mydev.com:58002              RUNNING        phddna01.mydev.com:8042                                  0

$ yarn node -status phddnb01.mydev.com:16638
15/06/05 14:31:03 INFO client.RMProxy: Connecting to ResourceManager at phdmst03.mydev.com/10.193.68.131:8032
Node Report :
Node-Id : phddnb01.mydev.com:16638
Rack : /default-rack
Node-State : RUNNING
Node-Http-Address : phddnb01.mydev.com:8042
Last-Health-Update : Fri 05/Jun/15 02:29:06:575EDT
Health-Report :
Containers : 0
Memory-Used : 0MB
Memory-Capacity : 8192MB
CPU-Used : 0 vcores
CPU-Capacity : 8 vcores

$ yarn
Usage: yarn [–config confdir] COMMAND
where COMMAND is one of:
resourcemanager      run the ResourceManager
nodemanager          run a nodemanager on each slave
rmadmin              admin tools
version              print the version
jar <jar>            run a jar file
application          prints application(s) report/kill application
node                 prints node report(s)
logs                 dump container logs
classpath            prints the class path needed to get the
Hadoop jar and the required libraries
daemonlog            get/set the log level for each daemon
or
CLASSNAME            run the class named CLASSNAME
Most commands print help when invoked w/o parameters.

yarn version
Hadoop 2.2.0-gphd-3.0.1.0
Source code repository: ssh://git@stash.greenplum.com:2222/phd/hadoop.git -r 3055df0b53cf992665913380a1651345c477a0d2
Compiled by pivotal on 2014-04-14T03:38Z
Compiled with protoc 2.5.0
From source with checksum 93b8d74f534acdc126e8575bba69fc70
This command was run using /usr/lib/gphd/hadoop/hadoop-common-2.2.0-gphd-3.0.1.0.jar

$ yarn rmadmin
Usage: java RMAdmin
[-refreshQueues]
[-refreshNodes]
[-refreshUserToGroupsMappings]
[-refreshSuperUserGroupsConfiguration]
[-refreshAdminAcls]
[-refreshServiceAcl]
[-getGroups [username]]
[-updateNodeResource [NodeID][MemSize][Cores]]
[-help [cmd]]

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

$ yarn rmadmin -getGroups gpadmin

15/06/18 11:37:27 INFO client.RMProxy: Connecting to ResourceManager at phdmst03.mydev.com/192.168.68.135:8033
gpadmin : gpadmin hadoop

Infra Cloud Solutions

Hadoop Commands

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply