Python Snakebite is a very popular Python library that we can use to communicate with the HDFS. Using the Python client library provided by the Snakebite package we can easily write Python code that works on HDFS. It uses protobuf messages to communicate directly with the NameNode. The python client library directly works with HDFS without making a system call to hdfs dfs. The Snakebite doesn’t support python3.
The hdfs dfs provides multiple commands through which we can perform multiple operations on HDFS. The client library that Snakebite provides will contain various methods that allow us to retrieve data from HDFS. text() method is used to simply read the data from a file available on our HDFS. So let’s perform a quick task to understand how we can retrieve data from a file from HDFS.
Task: Retrieving File Data From HDFS.
Step 1: Create a text file with the name data.txt and add some data to it.
cd Documents/ # Changing directory to Documents(You can choose as per your requirement) touch data.txt # touch command is used to create file in linux enviournment nano data.txt # nano is a command line text editor for Unix and Linux operating system cat data.txt # to see the content of a file
Step 2: Send this data.txt file to Hadoop HDFS with the help of copyFromLocal Command.
hdfs dfs -copyFromLocal /path 1 /path 2 .... /path n /destination
Using the command to sending data.txt to the root directory of HDFS.
hdfs dfs -copyFromLocal /home/dikshant/Documents/data.txt /
Now, Check whether the file is reached to the root directory of HDFS or not with the help of the below command.
hdfs dfs -ls /
You can check it manually by visiting http://localhost:50070/ then Utilities -> Browse the file system.
Step 3: Now our task is to read the data from data.txt we send to our HDFS. So create a file data_read.py in your local file system and add the below python code to it.
Client() method explanation:
The Client() method can accept all the below listed arguments:
- host(string): IP Address of NameNode.
- port(int): RPC port of Namenode.
- hadoop_version (int): Hadoop protocol version(by default it is: 9)
- use_trash (boolean): Use trash when removing the files.
- effective_use (string): Effective user for the HDFS operations (default user is current user).
Step 4: Run the read_data.py file and observe the result.
We have successfully fetched the data from data.txt with the help of client library.
We can also copy any file from HDFS to our Local file system with the help of Snakebite. To copy a file from HDFS create a file fetch_file.py and copy the below python code to it. copyToLocal() method is used to achieve this.
Now, run this python file you will see the below output.
We can observe that the file now has been copied to my /home/dikshant/desktop directory.
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.
- Deleting Files in HDFS using Python Snakebite
- Creating Files in HDFS using Python Snakebite
- Snakebite Python Package For Hadoop HDFS
- Hadoop - Python Snakebite CLI Client, Its Usage and Command References
- Introduction to Hadoop Distributed File System(HDFS)
- Anatomy of File Read and Write in HDFS
- Hadoop - HDFS (Hadoop Distributed File System)
- Retrieving HTML From data using Flask
- Retrieving And Updating Data Contained in Shelve in Python
- Python IMDbPY – Retrieving person using person ID
- Python IMDbPY – Retrieving company using company ID
- Python IMDbPY - Retrieving movie using movie ID
- HDFS Commands
- Difference Between HDFS and HBase
- Characteristics of HDFS
- Why a Block in HDFS is so Large?
- Python | setting and retrieving values of Tkinter variable
- Python IMDbPY – Retrieving art department cast from the movie object
- Python IMDbPY - Retrieving actor from the movie details
- Python IMDbPY - Retrieving role played by actor from the movie details
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to firstname.lastname@example.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.