How to Execute Character Count Program in MapReduce Hadoop?

Prerequisites: Hadoop and MapReduce

Required setup for completing the below task.

  1. Java Installation
  2. Hadoop installation 

Our task is to count the frequency of each character present in our input file. We are using Java for implementing this particular scenario. However, The MapReduce program can also be written in Python or C++. Execute the below steps to complete the task for finding the occurrence of each character.

Example:

Input

GeeksforGeeks 

Output

F  1
G  2
e  4
k  2
o  1
r  1
s  2

Step 1: First Open Eclipse -> then select File -> New -> Java Project ->Name it CharCount -> then select use an execution environment -> choose JavaSE-1.8 then next -> Finish



Step 2: Create Three Java Classes into the project. Name them CharCountDriver(having the main function), CharCountMapper, CharCountReducer.

Mapper Code: You have to copy and paste this program into the CharCountMapper Java Class file.

Java

filter_none

edit
close

play_arrow

link
brightness_4
code

import java.io.IOException;      
import org.apache.hadoop.io.IntWritable;    
import org.apache.hadoop.io.LongWritable;    
import org.apache.hadoop.io.Text;    
import org.apache.hadoop.mapred.MapReduceBase;    
import org.apache.hadoop.mapred.Mapper;    
import org.apache.hadoop.mapred.OutputCollector;    
import org.apache.hadoop.mapred.Reporter;    
public class CharCountMapper extends MapReduceBase implements Mapper<LongWritable,Text,Text,IntWritable>{    
    public void map(LongWritable key, Text value,OutputCollector<Text,IntWritable> output,     
           Reporter reporter) throws IOException{    
        String line = value.toString();    
        String  tokenizer[] = line.split("");    
        for(String SingleChar : tokenizer)  
        {  
            Text charKey = new Text(SingleChar);  
            IntWritable One = new IntWritable(1);  
            output.collect(charKey, One);                 
        }  
    }    
      
}

chevron_right


Reducer Code: You have to copy-paste this below program into the CharCountReducer Java Class file.

Java

filter_none

edit
close

play_arrow

link
brightness_4
code

import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
  
public class CharCountReducer extends MapReduceBase
    implements Reducer<Text, IntWritable, Text,
                       IntWritable> {
    public void
    reduce(Text key, Iterator<IntWritable> values,
           OutputCollector<Text, IntWritable> output,
           Reporter reporter) throws IOException
    {
        int sum = 0;
        while (values.hasNext()) {
            sum += values.next().get();
        }
        output.collect(key, new IntWritable(sum));
    }
}

chevron_right


Driver Code: You have to copy-paste this below program into the CharCountDriver Java Class file.

Java



filter_none

edit
close

play_arrow

link
brightness_4
code

import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;
public class CharCountDriver {
    public static void main(String[] args)
        throws IOException
    {
        JobConf conf = new JobConf(CharCountDriver.class);
        conf.setJobName("CharCount");
        conf.setOutputKeyClass(Text.class);
        conf.setOutputValueClass(IntWritable.class);
        conf.setMapperClass(CharCountMapper.class);
        conf.setCombinerClass(CharCountReducer.class);
        conf.setReducerClass(CharCountReducer.class);
        conf.setInputFormat(TextInputFormat.class);
        conf.setOutputFormat(TextOutputFormat.class);
        FileInputFormat.setInputPaths(conf,
                                      new Path(args[0]));
        FileOutputFormat.setOutputPath(conf,
                                       new Path(args[1]));
        JobClient.runJob(conf);
    }
}

chevron_right


Step 3: Now we need to add an external jar for the packages that we have import. Download the jar package Hadoop Common and Hadoop MapReduce Core according to your Hadoop version. You can check Hadoop Version with the below command:

hadoop version

Step 4: Now we add these external jars to our CharCount project. Right Click on CharCount -> then select Build Path-> Click on Configure Build Path and select Add External jars…. and add jars from it’s download location then click -> Apply and Close.

Step 5: Now export the project as a jar file. Right-click on CharCount choose Export.. and go to Java -> JAR file click -> Next and choose your export destination then click -> Next. Choose Main Class as CharCount by clicking -> Browse and then click -> Finish -> Ok.

Now the Jar file is successfully created and saved at /Documents directory with the name charectercount.jar in my case.

Step 6: Create a simple text file and add some data to it.

nano test.txt

You can also add text to the file manually or using some other editor like Vim or gedit.  

To see the content of the file use cat command available in Linux.



cat test.txt

Step 7: Start our Hadoop Daemons

start-dfs.sh
start-yarn.sh

Step 8: Move your test.txt file to the Hadoop HDFS.

Syntax:

hdfs dfs -put /file_path /destination

In below command / shows the root directory of our HDFS.

hdfs dfs -put /home/dikshant/Documents/test.txt /

Check the file is present in the root directory of HDFS or not.

hdfs dfs -ls /

Step 9: Now Run your Jar File with the below command and produce the output in CharCountResult File.

Syntax:

hadoop jar /jar_file_location /dataset_location_in_HDFS /output-file_name

Command:

hadoop jar /home/dikshant/Documents/charectercount.jar /test.txt /CharCountResult

Step 10: Now Move to localhost:50070/, under utilities select Browse the file system and download part-r-00000 in /CharCountResult directory to see result. we can also check the result i.e. that part-r-00000 file with cat command as shown below.

hdfs dfs -cat /CharCountResult/part-00000 




My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.


Article Tags :
Practice Tags :


Be the First to upvote.


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.