Job Initialisation in MapReduce

Resource manager hands off the request to the YARN scheduler when it receives a call to its submitApplication() method. The resource manager launches the application master’s process there when the scheduler allocates a container under the node manager’s management. MRAppMaster is the main class of Java application for the Java application that masters for MapReduce jobs. By creating a number of bookkeeping objects, it initializes the job to keep track of the job’s progress. This is because it will receive completion reports and receive progress from the tasks.
The next step is retrieving the input splits. These are computed in the client from the shared filesystem. Then a map task object is created for each split. It also creates a number of reduce task objects determined by the mapreduce.job.reduces property. This property is set by the setNumReduceTasks() method on Job. At this point, tasks are given IDs and how to run the tasks that make up the MapReduce job is decided up by the application master.
Application master may choose to run the tasks in the same JVM as itself if the job is small. If the running tasks and overhead of application running and allocation outweigh the gain to be running in parallel, application master will work on. Such a task is said to be uberised.

A job that is one that has less than 10 mappers, only one reducer can be defined as a small job. For a small job, the size of the input is less that one HDFS block. By setting mapreduce.job.ubertask.maxmaps, mapreduce.job.ubertask.maxreduces, and mapreduce.job.ubertask.maxbytes, these values may be changed for a job.
By setting mapreduce.job.ubertask.enable to true, Uber tasks must be enabled explicitly (for an individual job, or across the cluster). The application master calls the setupJob() method on the OutputCommitter, finally before any tasks can be run, the application master calls the setupJob() method on the OutputCommitter.
The final output directory for the job and the temporary working space for the task output is created by the FileOutputCommitter (it is the default). Final output directory for the job is created and for the task output, the temporary working space is created.



My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.




Article Tags :
Practice Tags :


Be the First to upvote.


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.