Create a directory in HDFS
$ hdfs dfs -mkdir /user/mike
-bash-4.1$ hadoop fs -mkdir hdfs://sphdmst01.dev.com/user/ovi/test
-bash-4.1$ hadoop fs -ls hdfs://sphdmst01.dev.com/user/ovi/
Found 3 items
drwxr-xr-x – gpadmin hadoop 0 2015-07-23 16:20 hdfs://sphdmst01.dev.com/user/ovi/ovi.har
drwxr-xr-x – gpadmin hadoop 0 2015-07-24 11:59 hdfs://sphdmst01.dev.com/user/ovi/ovi2.har
drwxr-xr-x – gpadmin hadoop 0 2015-09-18 16:43 hdfs://sphdmst01.dev.com/user/ovi/test
Copies files from the local file system to the destination file system
$ hadoop fs -put test.txt /user/mike/
Download
$hadoop fs -get /user/mike/test/txt /home
List the contents of a directory
$ hdfs dfs -ls /user/mike
Found 1 items
-rw-r–r– 3 gpadmin hadoop 15 2015-06-04 11:04 /user/mike/test.txt
$ hdfs dfs -cat /user/mike/test.txt
just a test
$ hdfs dfs -rm /user/mike/test.txt
15/06/04 11:40:00 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 86400000 minutes, Emptier interval = 0 minutes.
Moved: ‘hdfs://dev/user/mike/test.txt’ to trash at: hdfs://dev/user/gpadmin/.Trash/Current
Takes a source directory files as input and concatenates files in src into the destination local file
$ hadoop fs -put test1.txt /user/mike
$ hadoop fs -put test2.txt /user/mike
$ hadoop fs -ls /user/mike
Found 2 items
-rw-r–r– 3 gpadmin hadoop 26 2015-06-09 11:10 /user/mike/test1.txt
-rw-r–r– 3 gpadmin hadoop 28 2015-06-09 11:10 /user/mike/test2.txt
$ hadoop fs -getmerge /user/mike /tmp/output.txt
$ more output.txt
just a test
just a test
just a test2
just a test2
Check file system
$ hadoop fsck /
…………………………………………………………………
Total size: 1252660561619 B
Total dirs: 784
Total files: 43391
Total symlinks: 0 (Files currently being written: 6)
Total blocks (validated): 23155 (avg. block size 54098922 B) (Total open file blocks (not validated): 1)
Minimally replicated blocks: 23155 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 4
Number of racks: 1
FSCK ended at Thu Jun 04 10:54:29 EDT 2015 in 1544 milliseconds
The filesystem under path ‘/’ is HEALTHY
To view a list of all the blocks, and the locations of the blocks, the command would be
$hadoop fsck / -files -blocks -locations
$ hdfs dfs
Usage: hadoop fs [generic options]
[-appendToFile <localsrc> … <dst>]
[-cat [-ignoreCrc] <src> …]
[-checksum <src> …]
[-chgrp [-R] GROUP PATH…]
[-chmod [-R] <MODE[,MODE]… | OCTALMODE> PATH…]
[-chown [-R] [OWNER][:[GROUP]] PATH…]
[-copyFromLocal [-f] [-p] <localsrc> … <dst>]
[-copyToLocal [-p] [-ignoreCrc] [-crc] <src> … <localdst>]
[-count [-q] <path> …]
[-cp [-f] [-p] <src> … <dst>]
[-createSnapshot <snapshotDir> [<snapshotName>]]
[-deleteSnapshot <snapshotDir> <snapshotName>]
[-df [-h] [<path> …]]
[-du [-s] [-h] <path> …]
[-expunge]
[-get [-p] [-ignoreCrc] [-crc] <src> … <localdst>]
[-getmerge [-nl] <src> <localdst>]
[-help [cmd …]]
[-ls [-d] [-h] [-R] [<path> …]]
[-mkdir [-p] <path> …]
[-moveFromLocal <localsrc> … <dst>]
[-moveToLocal <src> <localdst>]
[-mv <src> … <dst>]
[-put [-f] [-p] <localsrc> … <dst>]
[-renameSnapshot <snapshotDir> <oldName> <newName>]
[-rm [-f] [-r|-R] [-skipTrash] <src> …]
[-rmdir [–ignore-fail-on-non-empty] <dir> …]
[-setrep [-R] [-w] <rep> <path> …]
[-stat [format] <path> …]
[-tail [-f] <file>]
[-test -[defsz] <path>]
[-text [-ignoreCrc] <src> …]
[-touchz <path> …]
[-usage [cmd …]]
Generic options supported are
-conf <configuration file> specify an application configuration file
-D <property=value> use value for given property
-fs <local|namenode:port> specify a namenode
-jt <local|jobtracker:port> specify a job tracker
-files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars> specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines.
The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]
Hadoop dfsadmin Command Options
$ hdfs dfsadmin
Usage: java DFSAdmin
Note: Administrative commands can only be run as the HDFS superuser.
[-report]
[-safemode enter | leave | get | wait]
[-allowSnapshot ]
[-disallowSnapshot ]
[-saveNamespace]
[-rollEdits]
[-restoreFailedStorage true|false|check]
[-refreshNodes]
[-finalizeUpgrade]
[-metasave filename]
[-refreshServiceAcl]
[-refreshUserToGroupsMappings]
[-refreshSuperUserGroupsConfiguration]
[-printTopology]
[-refreshNamenodes datanodehost:port]
[-deleteBlockPool datanode-host:port blockpoolId [force]]
[-setQuota …]
[-clrQuota …]
[-setSpaceQuota …]
[-clrSpaceQuota …]
[-setBalancerBandwidth ]
[-fetchImage ]
[-help [cmd]]
Generic options supported are
-conf specify an application configuration file
-D <property=value> use value for given property
-fs <local|namenode:port> specify a namenode
-jt <local|jobtracker:port> specify a job tracker
-files specify comma separated files to be copied to the map reduce cluster
-libjars specify comma separated jar files to include in the classpath.
-archives specify comma separated archives to be unarchived on the compute machines.
The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]
$ hdfs dfsadmin -safemode get
Safe mode is OFF
Hadoop haadmin Command Options
$ hdfs haadmin
Usage: DFSHAAdmin [-ns ]
[-transitionToActive ]
[-transitionToStandby ]
[-failover [–forcefence] [–forceactive] ]
[-getServiceState ]
[-checkHealth ]
[-help ]
Generic options supported are
-conf specify an application configuration file
-D <property=value> use value for given property
-fs <local|namenode:port> specify a namenode
-jt <local|jobtracker:port> specify a job tracker
-files specify comma separated files to be copied to the map reduce cluster
-libjars specify comma separated jar files to include in the classpath.
-archives specify comma separated archives to be unarchived on the compute machines.
The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]
Example:
$ hdfs haadmin -getServiceState nn1
active
$ hdfs haadmin -getServiceState nn2
standby
$ hdfs haadmin -checkHealth nn1
$ hdfs haadmin -checkHealth nn2
[gpadmin@phdmst01 ~]$ hdfs getconf
hdfs getconf is utility for getting configuration information from the config file.
hadoop getconf
[-namenodes] gets list of namenodes in the cluster.
[-secondaryNameNodes] gets list of secondary namenodes in the cluster.
[-backupNodes] gets list of backup nodes in the cluster.
[-includeFile] gets the include file path that defines the datanodes that can join the cluster.
[-excludeFile] gets the exclude file path that defines the datanodes that need to decommissioned.
[-nnRpcAddresses] gets the namenode rpc addresses
[-confKey [key]] gets a specific key from the configuration
Example:
[gpadmin@phdmst01 ~]$ hdfs getconf -namenodes
phdmst01.mydev.com phdmst02.mydev.com
[gpadmin@phdmst01 ~]$ hdfs getconf -nnRpcAddresses
phdmst01.mydevcom:8020
phdmst02.mydev.com:8020
Yarn
$ yarn node -list
15/06/05 14:26:11 INFO client.RMProxy: Connecting to ResourceManager at phdmst03.mydev.com/192.168.68.131:8032
Total Nodes:2
Node-Id Node-State Node-Http-Address Number-of-Running-Containers
phddnb01.mydev.com:16638 RUNNING phddnb01.mydev..com:8042 0
phddna01.mydev.com:58002 RUNNING phddna01.mydev.com:8042 0
$ yarn node -status phddnb01.mydev.com:16638
15/06/05 14:31:03 INFO client.RMProxy: Connecting to ResourceManager at phdmst03.mydev.com/10.193.68.131:8032
Node Report :
Node-Id : phddnb01.mydev.com:16638
Rack : /default-rack
Node-State : RUNNING
Node-Http-Address : phddnb01.mydev.com:8042
Last-Health-Update : Fri 05/Jun/15 02:29:06:575EDT
Health-Report :
Containers : 0
Memory-Used : 0MB
Memory-Capacity : 8192MB
CPU-Used : 0 vcores
CPU-Capacity : 8 vcores
$ yarn
Usage: yarn [–config confdir] COMMAND
where COMMAND is one of:
resourcemanager run the ResourceManager
nodemanager run a nodemanager on each slave
rmadmin admin tools
version print the version
jar <jar> run a jar file
application prints application(s) report/kill application
node prints node report(s)
logs dump container logs
classpath prints the class path needed to get the
Hadoop jar and the required libraries
daemonlog get/set the log level for each daemon
or
CLASSNAME run the class named CLASSNAME
Most commands print help when invoked w/o parameters.
yarn version
Hadoop 2.2.0-gphd-3.0.1.0
Source code repository: ssh://git@stash.greenplum.com:2222/phd/hadoop.git -r 3055df0b53cf992665913380a1651345c477a0d2
Compiled by pivotal on 2014-04-14T03:38Z
Compiled with protoc 2.5.0
From source with checksum 93b8d74f534acdc126e8575bba69fc70
This command was run using /usr/lib/gphd/hadoop/hadoop-common-2.2.0-gphd-3.0.1.0.jar
$ yarn rmadmin
Usage: java RMAdmin
[-refreshQueues]
[-refreshNodes]
[-refreshUserToGroupsMappings]
[-refreshSuperUserGroupsConfiguration]
[-refreshAdminAcls]
[-refreshServiceAcl]
[-getGroups [username]]
[-updateNodeResource [NodeID][MemSize][Cores]]
[-help [cmd]]
Generic options supported are
-conf <configuration file> specify an application configuration file
-D <property=value> use value for given property
-fs <local|namenode:port> specify a namenode
-jt <local|jobtracker:port> specify a job tracker
-files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars> specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines.
The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]
$ yarn rmadmin -getGroups gpadmin
15/06/18 11:37:27 INFO client.RMProxy: Connecting to ResourceManager at phdmst03.mydev.com/192.168.68.135:8033
gpadmin : gpadmin hadoop