Python Snakebite comes with a CLI(Command Line Interface) client which is an HDFS based client library. The hostname or IP address of the NameNode and RPC port of the NameNode must be known in order to use python snakebite CLI. We can list all of these port values and hostname by simply creating our own configuration file which contains all of these details of NameNode like the hostname of Localhost and RPC(Remote Procedure Call) port. In our demonstration, we will be using a more simpler way to use snakebite CLI by directly passing this port and host values to the command itself. Remote Procedure Call or RPC is a way to allocate port dynamically and is used for server and remote administration applications.
The values we are using here for hostname and port value can be found in the hadoop/etc/hadoop/core-site.xml file in fs.default.name property in your system. We can visit Snakebite CLI documentation to get more information about Snakebite CLI configuration.
We can also check fs.default.name property value with the help of the below command.
hdfs getconf -confKey fs.defaultFS # We can also use fs.default.name but fs.defaultFS is most favourable
Let’s see the fs.default.name property value manually in core-site.xml file in our system to know host or port.
We can see our default host is localhost or the port is 9000.
Usage Of Snakebite CLI
With the help of python snakebite CLI, we can easily implement most of the commands that we use with hdfs dfs like ls, mv, rm, put, get, du, df, etc. So let’s perform some basic operation to understand how Snakebite CLI works.
Using Snakebite CLI via path in command line – eg: hdfs://namenode_host:port/path
1. Listing all the directory’s available in the root directory of HDFS
snakebite ls hdfs://localhost:9000/<path>
snakebite ls hdfs://localhost:9000/
2. Removing a file from HDFS
snakebite rm hdfs://localhost:9000/<file_path_with_name>
snakebite rm hdfs://localhost:9000/data.txt
3. Creating a Directory(Name of the directory is /sample in my case)
snakebite mkdir hdfs://localhost:9000/<path_with_directory_name>
snakebite mkdir hdfs://localhost:9000/sample
4. Removing a Directory(Name of the directory is /sample in my case)
snakebite rmdir hdfs://localhost:9000/sample
Now with the above example, we get the idea of how we can implement and use the snakebite command-line interface. The important difference between the snakebite CLI and hdfs dfs is that the snakebite is a complete python client library and does not use any java library to communicate with the HDFS. The snakebite library’s command interacts faster with HDFS then hdfs dfs.
CLI Command Reference
The Python Snakebite library provides lots of facilities to work with HDFS. All the switches and commands for reference can be listed with help of simple snakebite command.
We can observe that all the commands available in hdfs dfs similar commands are also available in the snakebite command-line interface. Let’s perform a few more to get a better insight into snakebite CLI.
Check the snakebite version with the below command
1. cat: It is used to print the file data
snakebite cat hdfs://localhost:9000/test.txt
2. copyToLocal (or) get: To copy files/folders from hdfs store to the local file system.
snakebite copyToLocal <source> <destination>
snakebite copyToLocal hdfs://localhost:9000/test.txt /home/dikshant/Pictures
3. touchz: It creates an empty file.
snakebite touchz hdfs://localhost:9000/<name_of_directory>
snakebite touchz hdfs://localhost:9000/demo_file
4. du: display disk usage statistics
snakebite du hdfs://localhost:9000/ # show disk usage of root directory snakebite du hdfs://localhost:9000/Hadoop_File # show disk usage of /Hadoop_File directory i.e. already available
5. stat: It will give the last modified time of directory or path. In short, it will give stats of the directory or file
snakebite stat hdfs://localhost:9000/ snakebite stat hdfs://localhost:9000/Hadoop_File
6 setrep: This command is used to change the replication factor of a file/directory in HDFS. By default, it is 3 for anything which is stored in HDFS (as set in hdfs core-site.xml)
snakebite setrep 5 hdfs://localhost:9000/test.txt
In the below image, we can observe that we have change the replication factor from 1 to 5 for the test.txt file.
Similarly, we can perform multiple operations on HDFS using python snakebite CLI.
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning – Basic Level Course