MapReduce can be used to work with a solitary method call: submit() on a Job object (you can likewise call waitForCompletion(), which presents the activity on the off chance that it hasn’t been submitted effectively, at that point sits tight for it to finish).
Let’s understand the components –
- Client : Submitting the MapReduce job.
- Yarn node manager : In a cluster , it monitors and launches the compute containers on machines.
- Yarn resource manager : Handles the allocation of compute resources coordination on the cluster.
- MapReduce application master : Facilitates the tasks running the MapReduce work.
- Distributed Filesystem : Shares job files with other entities.
How to submit Job?
To create an internal JobSubmitter instance, use the submit() which further calls submitJobInternal() on it. Having submitted the job,
waitForCompletion() polls the job’s progress after submitting the job once per second. If the reports have changed since the last report, it further reports the progress to the console. The job counters are displayed when the job completes successfully. Else the error (that caused the job to fail) is logged to the console.
Processes implemented by JobSubmitter for submitting the Job :
- The resource manager askes for a new application ID that is used for MapReduce Job ID.
- Output specification of the job is checked. For e.g. an error is thrown to the MapReduce program or the job is not submitted or the output directory already exists or it has not been specified.
- If the splits cannot be computed, it computes the input splits for the job. This can be due to the job is not submitted and an error is thrown to the MapReduce program.
- Resources needed to run the job is copied – it includes the job JAR file, the computed input splits, to the shared filesystem in a directory named after the job ID and the configuration file.
- It copies job JAR with a high replication factor, which is controlled by mapreduce.client.submit.file.replication property. AS there are the number of copies across the cluster for the node managers to access.
- By calling submitApplication(), submits the job on the resource manager.
- MapReduce Job Execution
- Job Initialisation in MapReduce
- How to find top-N records using MapReduce
- MapReduce - Combiners
- Sum of even and odd numbers in MapReduce using Cloudera Distribution Hadoop(CDH)
- How to Execute WordCount Program in MapReduce using Cloudera Distribution Hadoop(CDH)
- Distributed Cache in Hadoop MapReduce
- How MapReduce handles data query ?
- How MapReduce completes a task?
- MapReduce - Understanding With Real-Life Example
- Hadoop - Mapper In MapReduce
- MapReduce Program - Weather Data Analysis For Analyzing Hot And Cold Days
- MapReduce Program - Finding The Average Age of Male and Female Died in Titanic Disaster
- Hadoop MapReduce - Data Flow
- MapReduce Architecture
- How to Execute Character Count Program in MapReduce Hadoop?
- Matrix Multiplication With 1 MapReduce Step
- Hadoop - mrjob Python Library For MapReduce With Example
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to email@example.com. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.