Skip to content
Related Articles
Get the best out of our app
GeeksforGeeks App
Open App

Related Articles

Creating Files in HDFS using Python Snakebite

Improve Article
Save Article
Like Article
Improve Article
Save Article
Like Article

Hadoop is a popular big data framework written in Java. But it is not necessary to use Java for working on Hadoop. Some other programming languages like Python, C++ can also be used. We can write C++ code for Hadoop using pipes API or Hadoop pipes. Hadoop pipes enable task-tracker with the help of sockets.

Python can also be used to write code for Hadoop. Snakebite is one of the popular libraries that is used for establishing communication with the HDFS. Using the python client library provided by the Snakebite package we can easily write python code that works on HDFS. It uses protobuf messages to communicate directly with the NameNode. The python client library directly works with HDFS without making a system call to hdfs dfs.

Prerequisite: Snakebite library should be installed.

Make sure Hadoop is running if not then start all the daemons with the below command.             // start your namenode datanode and secondary namenode            // start resourcemanager and nodemanager

Task: Create directories in HDFS using snakebite package using mkdir() method.

Step 1: Create a file in your local directory with the name at the desired location.

cd Documents/        # Changing directory to Documents(You can choose as per your requirement)

touch      # touch command is used to create file in linux enviournment.        

Step 2: Write the below code in the   python file.


# importing the package
from snakebite.client import Client
# the below line create client connection to the HDFS NameNode
client = Client('localhost', 9000)
# create directories mentioned in mkdir() methods first argument i.e. in List format
for p in client.mkdir(['/demo/demo1', '/demo2'], create_parent=True):
            print p

The mkdir() takes a list of the path of directories we want to make. create_parent=True ensures that if the parent directory is not created it should be created first. In our case, the demo directory will create first, and then demo1 will be created inside it.

Step 3: Run the file and observe the result.

python   // this will create directory's as mentioned in mkdir() argument.

In the above image ‘result’ :True states that we have successfully created the directory.

Step 4: We can check the directories are created or not either visiting manually or with the below command.

hdfs dfs -ls /       // list all the directory's in root folder

hdfs dfs -ls /demo   // list all the directory's present in demo folder

In the above image, we can observe that we have successfully created all the directories.

My Personal Notes arrow_drop_up
Last Updated : 14 Oct, 2020
Like Article
Save Article
Similar Reads
Related Tutorials