Pig is a high-level platform or tool which is used to process large datasets. It provides a high-level of abstraction for processing over the MapReduce. It provides a high-level scripting language, known as Pig Latin which is used to develop the data analysis codes.
In order to install Apache Pig, you must have Hadoop and Java installed on your system.
Step 1: Download the new release of Apache Pig from this Link. In my case I have downloaded the pig-0.17.0.tar.gz version of Pig which is latest and about 220MB in size.
Step 2: Now move the downloaded Pig tar file to your desired location. In my case I am Moving it to my /Documents folder.
Step 3: Now we extract this tar file with the help of below command (make sure to check your tar filename):
tar -xvf pig-0.17.0.tar.gz
Step 4: Once it is installed it’s time for us to switch to our Hadoop user. In my case it is hadoopusr. If you have not created the separate dedicated user for Hadoop then, in that case, no need to move that file and set the path according to your PIG PATH in the .bashrc file. To switch user you can use below command or you can also switch manually by switch user settings.
su - hadoopusr
Step 5: Now we need to move this extracted folder to the hadoopusr user. For that, use the below command(make sure name of your extracted folder is pig-0.17.0 otherwise change it accordingly)
sudo mv pig-0.17.0 /usr/local/
Step 6: Now once we moved it we need to change the environment variable for Pig’s location. For that open the bashrc file with below command.
sudo gedit ~/.bashrc
Once the file open save the below path inside this bashrc file.
Step 7: Then check whether you have configured it correctly or not using the below command:
Step 8: Once you get it correct that’s it we have successfully install pig to our Hadoop single node setup, now we start pig with below pig command.
Step 9: You can check your pig version with the below command.