Snakebite Python Package For Hadoop HDFS

Prerequisite: Hadoop and HDFS

Snakebite is a very popular python package that allows users to access HDFS using some kind of program with python application. The Snakebite Python Package is developed by Spotify. Snakebite also provides a Python client library. The protobuf messages are used by the snakebite client library to directly communicate with the NameNode that stores all the Metadata. All the file permission, logs, location where the data blocks are created all comes under metadata. The CLI i.e. the command-line interface is also available in this snakebite python package that is based on the client library.

Let’s discuss how to install and configure the snakebite package for HDFS.

Requirement:  

  • Python 2 and python-protobuf 2.4.1 or higher is required for snakebite.

The snakebite library can be installed easily with pip.



# Make sure you have pip for python version 2 otherwise you will face error while importing module
pip install snakebite

We already have snakebite so the requirement is satisfied.

The Client Library

The client library is built using python and it uses Hadoop RPC protocol and protobuf messages to communicate with the NameNode that handles all the metadata of the cluster. With the help of this client library, the Python applications communicate directly with the HDFS i.e. Hadoop Distributed File System without making any connection with the hdfs dfs using a system call.

Let’s write one simple python program to understand the working of the snakebite python package.

Task:  List all the content of the root directory of HDFS using Snakebite client library. 

Step1: Create a python file with name list_down_root_dir.py at your desired location in the system.

cd Documents/        # Changing directory to Documents(You can choose as per your requirement)

touch list_down_root_dir.py      # touch command is used to create file in linux enviournment. 



Step2: Write the below code in the list_down_root_dir.py python file.

Python

filter_none

edit
close

play_arrow

link
brightness_4
code

# importing the package
from snakebite.client import Client
# the below line create client connection to the HDFS NameNode
client = Client('localhost', 9000)
# the loop iterate in root directory to list all the content 
for x in client.ls(['/']):
    print x

chevron_right


Client() method explanation:

The Client() method can accept all the below listed arguments:

  • host(string): IP Address of NameNode.
  • port(int): RPC port of Namenode.

We can check the host and the default port in core-site.xml file. We can also configure it as per our use.

  • hadoop_version (int): Hadoop  protocol version(by default it is: 9)
  • use_trash (boolean): use trash when removing the files.
  • effective_use (string): Effective user for the HDFS operations (default user is current user).

Step3: Start Hadoop Daemon with below command.

start-dfs.sh  // start your namenode datanode and secondary namenode

start-yarn.sh  // start resourcemanager and nodemanager

Step4:  Run the list_down_root_dir.py file  and observe the result.

python list_down_root_dir.py

In above image, you can see all the content that is available in root directory of my HDFS.

Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.




My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.


Article Tags :
Practice Tags :


Be the First to upvote.


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.