MapReduce is a model that works over Hadoop to access big data efficiently stored in HDFS (Hadoop Distributed File System). It is the core component of Hadoop, which divides the big data into small chunks and process them parallelly.
Features of MapReduce:
- It can store and distribute huge data across various servers.
- Allows users to store data in a map and reduce form to get processed.
- It protects the system to get any unauthorized access.
- It supports the parallel processing model.
Pig is an open-source tool that is built on the Hadoop ecosystem for providing better processing of Big data. It is a high-level scripting language that is commonly known as Pig Latin scripts. It works on the HDFS (Hadoop Distributed File System), which supports the use of various types of data.
Features of Pig:
- It allows the user to create custom user-defined functions.
- It is extensible to use.
- Supports a variety of data types such as char long float schema, and functions.
- Provides different operations on HDFS such as GROUP, FILTER, JOIN, SORT.
Difference between MapReduce and Pig:
|1.||It is a Data Processing Language.||It is a Data Flow Language.|
|2.||It converts the job into map-reduce functions.||It converts the query into map-reduce functions.|
|3.||It is a Low-level Language.||It is a High-level Language|
|4.||It is difficult for the user to perform join operations.||Makes it easy for the user to perform Join operations.|
|5.||The user has to write 10 times more lines of code to perform a similar task than Pig.||The user has to write fewer lines of code because it supports the multi-query approach.|
|6.||It has several jobs therefore execution time is more.||It is less compilation time as the Pig operator converts it into MapReduce jobs.|
|7.||It is supported by recent versions of the Hadoop.||It is supported with all versions of Hadoop.|