Skip to content
Related Articles

Related Articles

Improve Article

Difference Between Hadoop and MapReduce

  • Last Updated : 08 Jun, 2020
Geek Week

Hadoop: Hadoop software is a framework that permits for the distributed processing of huge data sets across clusters of computers using simple programming models. In simple terms, Hadoop is a framework for processing ‘Big Data’. Hadoop was created by Doug Cutting.it was also created by Mike Cafarella. It is designed to divide from single servers to thousands of machines, each having local computation and storage. Hadoop is an open-source software. The core of Apache Hadoop consists of a storage part, known as the Hadoop Distributed File System(HDFS), and a processing part which may be a Map-Reduce programming model. Hadoop splits files into large blocks and distributes them across nodes during a cluster. It then transfers packaged code into nodes to process the info in parallel.

Mapreduce: MapReduce is a programming model that is used for processing and generating large data sets on clusters of computers. It was introduced by Google. Mapreduce is a concept or a method for large scale parallelization.It is inspired by functional programming’s map() and reduce() functions.
MapReduce program is executed in three stages they are:

  • Mapping: Mapper’s job is to process input data.Each node applies the map function to the local data.
  • Shuffle: Here nodes are redistributed where data is based on the output keys.(output keys are produced by map function).
  • Reduce: Nodes are now processed into each group of output data, per key in parallel.

Hadoop-vs-MapReduce

Below is a table of differences between Hadoop and MapReduce:

Based onHadoopMapReduce
DefinationThe Apache Hadoop is a software that allows all the distributed processing of large data sets across clusters of computers using simple programmingMapReduce is a programming model which is an implementation for processing and generating big data sets with distributed algorithm on a cluster.
MeaningThe name “Hadoop” was the named after Doug cutting’s son’s toy elephant. He named this project as “Hadoop” as it was easy to pronounce it.The “MapReduce” name came into existence as per the functionality itself of mapping and reducing in key-value pairs.
FrameworkHadoop not only has storage framework which stores the data but creating name node’s and data node’s it also has other frameworks which include MapReduce itself.MapReduce is a programming framework which uses a key, value mappings to sort/process the data
InventionHadoop was created by Doug Cutting and Mike Cafarella.Mapreduce is invented by Google.
Features
  • Hadoop is Open Source
  • Hadoop cluster is Highly Scalable
  • Mapreduce provides Fault Tolerance
  • Mapreduce provides High Availability
  • ConceptThe Apache Hadoop is an eco-system which provides an environment which is reliable, scalable and ready for distributed computing.MapReduce is a submodule of this project which is a programming model and is used to process huge datasets which sits on HDFS (Hadoop distributed file system).
    LanguageHadoop is a collection of all modules and hence may include other programming/scripting languages tooMapReduce is basically written in Java programming language
    Pre-requisitesHadoop runs on HDFS (Hadoop Distributed File System)MapReduce can run on HDFS/GFS/NDFS or any other distributed system for example MapR-FS

    GeeksforGeeks LIVE courses

    My Personal Notes arrow_drop_up
    Recommended Articles
    Page :