MapReduce can be used to work with a solitary method call: submit() on a Job object (you can likewise call waitForCompletion(), which presents the activity on the off chance that it hasn’t been submitted effectively, at that point sits tight for it to finish).
Let’s understand the components –
- Client : Submitting the MapReduce job.
- Yarn node manager : In a cluster , it monitors and launches the compute containers on machines.
- Yarn resource manager : Handles the allocation of compute resources coordination on the cluster.
- MapReduce application master : Facilitates the tasks running the MapReduce work.
- Distributed Filesystem : Shares job files with other entities.
How to submit Job?
To create an internal JobSubmitter instance, use the submit() which further calls submitJobInternal() on it. Having submitted the job,
waitForCompletion() polls the job’s progress after submitting the job once per second. If the reports have changed since the last report, it further reports the progress to the console. The job counters are displayed when the job completes successfully. Else the error (that caused the job to fail) is logged to the console.
Processes implemented by JobSubmitter for submitting the Job :
- The resource manager askes for a new application ID that is used for MapReduce Job ID.
- Output specification of the job is checked. For e.g. an error is thrown to the MapReduce program or the job is not submitted or the output directory already exists or it has not been specified.
- If the splits cannot be computed, it computes the input splits for the job. This can be due to the job is not submitted and an error is thrown to the MapReduce program.
- Resources needed to run the job is copied – it includes the job JAR file, the computed input splits, to the shared filesystem in a directory named after the job ID and the configuration file.
- It copies job JAR with a high replication factor, which is controlled by mapreduce.client.submit.file.replication property. AS there are the number of copies across the cluster for the node managers to access.
- By calling submitApplication(), submits the job on the resource manager.
- Job Initialisation in MapReduce
- MapReduce Job Execution
- How MapReduce completes a task?
- How to find top-N records using MapReduce
- How MapReduce handles data query ?
- Distributed Cache in Hadoop MapReduce
- Sum of even and odd numbers in MapReduce using Cloudera Distribution Hadoop(CDH)
- How to Execute WordCount Program in MapReduce using Cloudera Distribution Hadoop(CDH)
- Does Dark Data Have Any Worth In The Big Data World?
- How Big Data Artificial Intelligence is Changing the Face of Traditional Big Data?
- Difference between RDBMS and Hive
- HIVE Overview
- How Does Namenode Handles Datanode Failure in Hadoop Distributed File System?
- What is Big Data?
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to email@example.com. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.