How MapReduce completes a task?
Application master changes the status for the job to “successful” when it receives a notification that the last task for a job is complete. Then it learns that the job has completed successfully when the Job polls for status. So, a message returns from the
waitForCompletion() method after it prints a message, to tell the user about the successful completion of the task. At this point job, statistics and counters are printed. If the application master is configured to do so, it also sends an HTTP job notification. Using the
mapreduce.job.end-notification.url the property, clients wishing to receive callbacks that can configure it. Finally, the task containers and the application master clean up their working state after completing the job. So, the
OutputCommitter's commitJob() method is called and the intermediate output is deleted. To enable later interrogation by users if desired, job information is archived by the job history server.
Case of failures?
Real user code can process crash, can be full of bugs or even the machine can fail. The capability of Hadoop to handle such failures is the biggest benefit of using it which allows the job to be completed successfully. Any of the following components can fail:
- Application master
- Node manager
- Resource manager
The most common of this is Task failure. When a user code in the reduce task or map task, runtime exception is the most common occurrence of this failure. JVM reports the error back if this happens, to its parent application master before it exits. The error finally makes it to the user logs. The application frees up the container so its resources are available for another task after marking the task attempt as failed.
To stream the task, the Streaming process is marked as failed if the Streaming process exits with a nonzero exit code.
stream.non.zero.exit.is.failure property (the default is true) governs this behaviour. The sudden exit of the task, JVM is another failure mode and perhaps due to the exposition of MapReduce user code, there is a JVM bug that causes the JVM to exit for a particular set of circumstances. Node manager notices that the process has exited. So, it can mark the attempt as failed as the application master is informed. Hanging tasks are dealt with differently. Application master proceeds to mark the task as failed and notices that it hasn’t received a progress update for some time. After this period, the task JVM process will be killed automatically. The timeout period can be configured on a per-job basis by setting the
mapreduce.task.timeout property to a value in milliseconds. After this task, tasks are considered failed is normally 10 minutes. Long-running tasks are never marked as failed because setting the timeout to a value of zero disables the timeout. Over time there may be cluster slowdown as a result and a hanging task will never free up its container. So to make sure that a task is reporting progress periodically should suffice, this approach should be avoided. The application master will reschedule the execution of the task after it is being notified of a task attempt. After the task is failed, the application master will try to avoid rescheduling the task on a node manager. It will not be retried again if a task fails four times. This value is configurable to control the maximum number of the task. It is controlled by the
mapreduce.reduce.maxattempts for reduce tasks and
mapreduce.map.maxattempts property for map tasks. The whole job fails by default if any task fails four times. If a few tasks fail, it is undesirable to abort the job for some application because to use the results of the job despite some failures is possible. Without triggering, job failure can be set for the job. Using the
mapreduce.reduce.failures.maxpercent properties map tasks and reduce tasks are controlled independently. Task getting killed is different from failing. Because of speculative duplicate or if the node manager was running, a task attempt may also be killed.
mapreduce.reduce.maxattempts tasks will not count killed task attempts against the number of attempts to run the task.