Hadoop -getmerge command is used to merge multiple files in an HDFS(Hadoop Distributed File System) and then put it into one single output file in our local file system.
We want to merge the 2 files present inside are HDFS i.e. file1.txt and file2.txt, into a single file output.txt in our local file system.
Steps To Use -getmerge Command
Step 1: Let’s see the content of file1.txt and file2.txt that are available in our HDFS. You can see the content of File1.txt in the below image:
Content of File2.txt
In this case, we have copied both of these files inside my HDFS in Hadoop_File folder. If you don’t know how to make the directory and copy files to HDFS then follow below command to do so.
- Making Hadoop_Files directory in our HDFS
hdfs dfs -mkdir /Hadoop_File
- Copying files to HDFS
hdfs dfs -copyFromLocal /home/dikshant/Documents/hadoop_file/file1.txt /home/dikshant/Documents/hadoop_file/file2.txt /Hadoop_File
Below is the Image showing this file inside my /Hadoop_File directory in HDFS.
Step 2: Now it’s time to use -getmerge command to merge these files into a single output file in our local file system for that follow the below procedure.
hdfs dfs -getmerge -nl /path1 /path2 ..../path n /destination
-nl is used for adding new line. this will add a new line between the content of these n files. In this case we have merge it to /hadoop_file folder inside my /Documents folder.
hdfs dfs -getmerge -nl /Hadoop_File/file1.txt /Hadoop_File/file2.txt /home/dikshant/Documents/hadoop_file/output.txt
Now let’s see whether the file get merged in output.txt file or not.
In the above image, you can easily see that the file is merged successfully in our output.txt file.
- Difference between Hadoop 1 and Hadoop 2
- Introduction to Hadoop
- Hadoop - Introduction
- Introduction to Hadoop Distributed File System(HDFS)
- Hadoop | History or Evolution
- Hadoop YARN Architecture
- Hadoop Ecosystem
- Map Reduce in Hadoop
- Sum of even and odd numbers in MapReduce using Cloudera Distribution Hadoop(CDH)
- How to Execute WordCount Program in MapReduce using Cloudera Distribution Hadoop(CDH)
- Distributed Cache in Hadoop MapReduce
- Volunteer and Grid Computing | Hadoop
- RDMS vs Hadoop
- How Does Namenode Handles Datanode Failure in Hadoop Distributed File System?
- Difference Between Hadoop and Cassandra
- Difference Between Hadoop and Teradata
- Difference Between Cloud Computing and Hadoop
- Difference Between Big Data and Apache Hadoop
- Difference Between Hadoop and HBase
- Difference Between Hadoop and Splunk
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to firstname.lastname@example.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.