Counting the number of words in any language is a piece of cake like in C, C++, Python, Java, etc. MapReduce also uses Java but it is very easy if you know the syntax on how to write it. It is the basic of MapReduce. You will first learn how to execute this code similar to “Hello World” program in other languages. So here are the steps which show how to write a MapReduce code for Word Count.
Hello I am GeeksforGeeks Hello I am an Intern
GeeksforGeeks 1 Hello 2 I 2 Intern 1 am 2 an 1
- First Open Eclipse -> then select File -> New -> Java Project ->Name it WordCount -> then Finish.
- Create Three Java Classes into the project. Name them WCDriver(having the main function), WCMapper, WCReducer.
- You have to include two Reference Libraries for that:
Right Click on Project -> then select Build Path-> Click on Configue Build Path
In the above figure, you can see the Add External JARs option on the Right Hand Side. Click on it and add the below mention files. You can find these files in /usr/lib/
Mapper Code: You have to copy paste this program into the WCMapper Java Class file.
Reducer Code: You have to copy paste this program into the WCReducer Java Class file.
Driver Code: You have to copy paste this program into the WCDriver Java Class file.
- Now you have to make a jar file. Right Click on Project-> Click on Export-> Select export destination as Jar File-> Name the jar File(WordCount.jar) -> Click on next -> at last Click on Finish. Now copy this file into the Workspace directory of Cloudera
- Open the terminal on CDH and change the directory to the workspace. You can do this by using “cd workspace/” command. Now, Create a text file(WCFile.txt) and move it to HDFS. For that open terminal and write this code(remember you should be in the same directory as jar file you have created just now).
Now, run this command to copy the file input file into the HDFS.
hadoop fs -put WCFile.txt WCFile.txt
- Now to run the jar file by writing the code as shown in the screenshot.
- After Executing the code, you can see the result in WCOutput file or by writing following command on terminal.
hadoop fs -cat WCOutput/part-00000
- Sum of even and odd numbers in MapReduce using Cloudera Distribution Hadoop(CDH)
- How to Execute Character Count Program in MapReduce Hadoop?
- MapReduce Program - Weather Data Analysis For Analyzing Hot And Cold Days
- MapReduce Program - Finding The Average Age of Male and Female Died in Titanic Disaster
- How to find top-N records using MapReduce
- MapReduce - Combiners
- Distributed Cache in Hadoop MapReduce
- How MapReduce handles data query ?
- MapReduce Job Execution
- Job Initialisation in MapReduce
- How Job runs on MapReduce
- How MapReduce completes a task?
- Matrix Multiplication With 1 MapReduce Step
- MapReduce - Understanding With Real-Life Example
- Hadoop - Mapper In MapReduce
- Hadoop MapReduce - Data Flow
- MapReduce Architecture
- Import and Export Data using SQOOP
- Deleting Files in HDFS using Python Snakebite
- Creating Files in HDFS using Python Snakebite
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to email@example.com. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.
Improved By : maxkhkh5