Pig is a high-level platform or tool which is used to process large datasets. It provides a high-level of abstraction for processing over the MapReduce. It provides a high-level scripting language, known as Pig Latin which is used to develop the data analysis codes.
In order to install Apache Pig, you must have Hadoop and Java installed on your system.
Step 1: Download the new release of Apache Pig from this Link. In my case I have downloaded the pig-0.17.0.tar.gz version of Pig which is latest and about 220MB in size.
Step 2: Now move the downloaded Pig tar file to your desired location. In my case I am Moving it to my /Documents folder.
Step 3: Now we extract this tar file with the help of below command (make sure to check your tar filename):
tar -xvf pig-0.17.0.tar.gz
Step 4: Once it is installed it’s time for us to switch to our Hadoop user. In my case it is hadoopusr. If you have not created the separate dedicated user for Hadoop then, in that case, no need to move that file and set the path according to your PIG PATH in the .bashrc file. To switch user you can use below command or you can also switch manually by switch user settings.
su - hadoopusr
Step 5: Now we need to move this extracted folder to the hadoopusr user. For that, use the below command(make sure name of your extracted folder is pig-0.17.0 otherwise change it accordingly)
sudo mv pig-0.17.0 /usr/local/
Step 6: Now once we moved it we need to change the environment variable for Pig’s location. For that open the bashrc file with below command.
sudo gedit ~/.bashrc
Once the file open save the below path inside this bashrc file.
Step 7: Then check whether you have configured it correctly or not using the below command:
Step 8: Once you get it correct that’s it we have successfully install pig to our Hadoop single node setup, now we start pig with below pig command.
Step 9: You can check your pig version with the below command.
- Introduction to Apache Pig
- Difference Between Apache Hadoop and Apache Storm
- Difference between Pig and Hive
- How to Install Hadoop in Linux?
- Apache Hive
- Difference Between Big Data and Apache Hadoop
- Difference Between Hadoop and Apache Spark
- Apache Hive Installation and Configuring MySql Metastore for Hive
- Difference Between Apache Hadoop and Amazon Redshift
- Apache Hive Installation With Derby Database And Beeline
- Apache HIVE - Features And Limitations
- Apache HIVE - Database Options
- Apache Hive - Getting Started With HQL Database Creation And Drop Database
- Apache Spark with Scala - Resilient Distributed Dataset
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to firstname.lastname@example.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.