The HDSF is designed to store and process large data sets ( terabytes). Storing a large number of small files in HDFS is inefficient.
Hadoop Archives (HAR) can be used to address the namespace limitations associated with storing many small files. Whit HAR we can packs a number of small files into large files so that the original files can be accessed transparently .
You can use following command to create a Hadoop archive:
hadoop archive -archiveName name -p *
Example :
[gpadmin@]$ hadoop archive -archiveName ovi.har -p /user/ovidiu /user/ovi
[gpadmin@sphdmst02 ~]$ hadoop fs -ls /user/ovi
Found 1 items
drwxr-xr-x – gpadmin hadoop 0 2015-07-23 16:20 /user/ovi/ovi.har
[gpadmin@]$ hadoop archive -archiveName ovi.har -p /user/ovidiu /user/ovi
Following example create creates an archive using /user/ovidiu as the relative archive directory.
The directories
/user/ovidiu/SIT1
/user/ovidiu/SIT2
/user/ovidiu/SIT3
will be archived in the /user/ovi/ovi2.har archive
$ hadoop archive -archiveName ovi2.har -p /user/ovidiu/ SIT1 SIT2 SIT3 /user/ovi
[gpadmin@cmtolsphdmst02 ~]$ hadoop fs -ls /user/ovi
Found 2 items
drwxr-xr-x – gpadmin hadoop 0 2015-07-23 16:20 /user/ovi/ovi.har
drwxr-xr-x – gpadmin hadoop 0 2015-07-24 11:59 /user/ovi/ovi2.har
Looking up file in hadoop archives
To a client using the HAR filesystem nothing has changed: the original files are accessible and visible (albeit using a har:// URL)
[gpadmin@sphdmst02 ~]$ hdfs dfs -ls har:///user/ovi/ovi.har/
Found 3 items
-rw-r–r– 2 gpadmin hadoop 125 2015-07-23 16:19 har:///user/ovi/ovi.har/ranking.txt
-rw-r–r– 2 gpadmin hadoop 66 2015-01-14 16:06 har:///user/ovi/ovi.har/test.txt
-rw-r–r– 2 gpadmin hadoop 13 2015-07-23 16:18 har:///user/ovi/ovi.har/test2.txt
[gpadmin@sphdmst02 ~]$ hdfs dfs -ls har:///user/ovi/ovi2.har/
Found 3 items
drwxr-xr-x – gpadmin hadoop 0 2015-07-24 11:54 har:///user/ovi/ovi2.har/SIT1
drwxr-xr-x – gpadmin hadoop 0 2015-07-24 11:55 har:///user/ovi/ovi2.har/SIT2
drwxr-xr-x – gpadmin hadoop 0 2015-07-24 11:55 har:///user/ovi/ovi2.har/SIT3
[gpadmin@cmtolsphdmst02 ~]$ hdfs dfs -ls har:///user/ovi/ovi2.har/SIT1
Found 2 items
-rw-r–r– 2 gpadmin hadoop 125 2015-07-24 11:52 har:///user/ovi/ovi2.har/SIT1/ranking.txt
-rw-r–r– 2 gpadmin hadoop 13 2015-07-24 11:54 har:///user/ovi/ovi2.har/SIT1/test2.txt