Why Snapshots Are NOT Ideal for Migration?
- Snapshots stay within the same cluster – They cannot be directly copied to another cluster.
- No built-in transfer mechanism – Snapshots are stored as metadata; you still need another tool (like DistCp) to move data.
- Cannot directly restore on a different system – The target cluster must support the same snapshot system (e.g., HDFS or MapR-FS).
Best Way to Migrate Data Between Clusters
Use DistCp for large-scale data migration:
hadoop distcp -update hdfs://source-cluster/data hdfs://destination-cluster/data
or for cloud migration:
hadoop distcp hdfs://source/data s3a://bucket-name/
Create & Restore Snapshot (MapR-FS or HDFS)
hdfs dfsadmin -allowSnapshot /mydata
hdfs dfs -createSnapshot /mydata my_snapshot
hdfs dfs -restoreSnapshot /mydata/my_snapshot
DistCp for Data Migration
hadoop distcp -update -delete hdfs://source-cluster/data hdfs://destination-cluster/data