Creating Files in HDFS using Python Snakebite

Hadoop is a popular big data framework written in Java. But it is not necessary to use Java for working on Hadoop. Some other programming languages like Python, C++ can also be used. We can write C++ code for Hadoop using pipes API or Hadoop pipes. Hadoop pipes enable task-tracker with the help of sockets.

Python can also be used to write code for Hadoop. Snakebite is one of the popular libraries that is used for establishing communication with the HDFS. Using the python client library provided by the Snakebite package we can easily write python code that works on HDFS. It uses protobuf messages to communicate directly with the NameNode. The python client library directly works with HDFS without making a system call to hdfs dfs.

Prerequisite: Snakebite library should be installed.

Make sure Hadoop is running if not then start all the daemons with the below command.

start-dfs.sh             // start your namenode datanode and secondary namenode

start-yarn.sh            // start resourcemanager and nodemanager



Task: Create directories in HDFS using snakebite package using mkdir() method.

Step 1: Create a file in your local directory with the name create_directory.py at the desired location.

cd Documents/        # Changing directory to Documents(You can choose as per your requirement)

touch create_directory.py      # touch command is used to create file in linux enviournment.        

Step 2: Write the below code in the create_directory.py   python file.

Python

filter_none

edit
close

play_arrow

link
brightness_4
code

# importing the package
from snakebite.client import Client
  
# the below line create client connection to the HDFS NameNode
client = Client('localhost', 9000)
  
# create directories mentioned in mkdir() methods first argument i.e. in List format
for p in client.mkdir(['/demo/demo1', '/demo2'], create_parent=True):
            print p

chevron_right


The mkdir() takes a list of the path of directories we want to make. create_parent=True ensures that if the parent directory is not created it should be created first. In our case, the demo directory will create first, and then demo1 will be created inside it.

Step 3: Run the create_directory.py file and observe the result.



python create_directory.py   // this will create directory's as mentioned in mkdir() argument.

In the above image ‘result’ :True states that we have successfully created the directory.

Step 4: We can check the directories are created or not either visiting manually or with the below command.

hdfs dfs -ls /       // list all the directory's in root folder

hdfs dfs -ls /demo   // list all the directory's present in demo folder

In the above image, we can observe that we have successfully created all the directories.

Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.




My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.


Article Tags :
Practice Tags :


Be the First to upvote.


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.