Open In App
Related Articles

hashlib module in Python

Improve
Improve
Improve
Like Article
Like
Save Article
Save
Report issue
Report

A Cryptographic hash function is a function that takes in input data and produces a statistically unique output, which is unique to that particular set of data. The hash is a fixed-length byte stream used to ensure the integrity of the data. In this article, you will learn to use the hashlib module to obtain the hash of a file in Python. The hashlib module is preinstalled in most Python distributions. If it doesn’t exist in your environment, then you can install the module by using pip command:

pip install hashlib

What is the Hashlib Module?

The hashlib module implements a common interface for many secure cryptographic hash and message digest algorithms. There is one constructor method named for each type of hash. All return a hash object with the same simple interface. Constructors for hash algorithms are always present in this module. 

hashlib.algorithms_guaranteed

A set containing the names of the hash algorithms is guaranteed to be supported by this module on all platforms.

>>> print(hashlib.algorithms_guaranteed)

{‘sha3_512’, ‘sha1’, ‘sha224’, ‘shake_256’, ‘sha3_384’, ‘sha512’, ‘sha384’, ‘blake2s’, ‘md5’, ‘sha3_224’, ‘sha256’, ‘blake2b’, ‘sha3_256’, ‘shake_128’}

hashlib.algorithms_available

A set containing the names of the hash algorithms available in the running Python interpreter.  The same algorithm may appear multiple times in this set under different names (due to OpenSSL).

>>> print(hashlib.algorithms_available)

{‘sha384’, ‘sha3_224’, ‘whirlpool’, ‘ripemd160’, ‘blake2s’, ‘md5-sha1’, ‘sm3’, ‘sha256’, ‘shake_256’, ‘sha1’, ‘sha3_384’, 

‘sha512’, ‘blake2b’, ‘sha512_256’, ‘sha3_256’, ‘shake_128’, ‘sha3_512’, ‘sha224’, ‘md5’, ‘mdc2’, ‘sha512_224’, ‘md4’}

Explanation of SHA-256 Algorithm and its Features

This article will use the FIPS secure hash algorithm SHA-256 to obtain the file hash. Other secure hash algorithms include:

  • MD5 (Message Digest 5)
  • SHA-512 (Secure Hashing Algorithm 512 bits)
  • RC4 (Rivest Cipher 4)

The reason for the usage of SHA-256 is it is one of the most renowned and secure hashing algorithms currently used while offering less time required to compute a hash. The algorithm belongs to the SHA-2 Family, which is succeeded by the SHA-3 family based on sponge construction structure.

Obtaining a Cryptographic Hash of a File

In the following example, a path to a file would be provided as a command line argument. Then the SHA 256 (Secured Hashing Algorithm-256bits) hash would be obtained for the file and displayed. 

Hash of the following file:

hashlib module in Python

test.txt

Firstly the hashlib and sys modules are imported. The sys module is imported to allow command-line arguments in the code. Then the function that would be used to obtain the SHA-256 hash of the file is defined. In the function, a Buffer size is defined (65536 in our case). This buffer size is the number of bytes read from the file (at a time) and fed into the SHA-256 hash function. This allows larger files to be operated without producing memory constraints. At the end of the function, the hexdigest function is called on the hash to produce its hexadecimal representation. The function call to the above function (hashfile) contains the first argument (sys.argv[1]) that is provided while calling the function from the command line (the 0th argument is the Python file name). In the end, the hash of the file is displayed.

Python3

# importing sys for getting commandline arguments
import sys
 
# importing hashlib for getting sha256() hash function
import hashlib
 
 
def hashfile(file):
 
    # A arbitrary (but fixed) buffer size
    # 65536 = 65536 bytes = 64 kilobytes
    BUF_SIZE = 65536
 
    # Initializing the sha256() method
    sha256 = hashlib.sha256()
 
    # Opening the file provided as the first
    # commandline argument
    with open("test.txt", 'rb') as f:
        while True:
            # reading data = BUF_SIZE from the
            # file and saving it in a variable
            data = f.read(BUF_SIZE)
 
            # True if eof = 1
            if not data:
                break
 
            # Passing that data to that sh256 hash
            # function (updating the function with that data)
            sha256.update(data)
 
    # sha256.hexdigest() hashes all the input data passed
    # to the sha256() via sha256.update()
    # Acts as a finalize method, after which
    # all the input data gets hashed
    # hexdigest() hashes the data, and returns
    # the output in hexadecimal format
    return sha256.hexdigest()
 
 
# Calling hashfile() function to obtain hash of the file
# and saving the result in a variable
file_hash = hashfile(sys.argv[1])
 
print(f"Hash:{file_hash}")

                    

Output:

Hash of a String in Python

 

Obtaining a Cryptographic Hash of a String

The above method could also be used to obtain the hash of a finite-length string. For that, the string needs to be converted to a byte stream before it is sent as an argument. For short strings, the process could be accomplished in a single call. The following example demonstrates this in practice:

Firstly a byte literal is initialized and is stored to a variable (due to the b prefix of the string). Then the sha256 function is initialized, and the byte literal is passed as an argument to the update function. This updates the sha256 algorithm with the data. After which, the hash digest is computed, and its hexadecimal equivalent is requested using the hexdigest function. At the end, this hash value is displayed.

Python3

# importing hashlib for getting sha256() hash function
import hashlib
 
 
# A string that has been stored as a byte stream
# (due to the prefix b)
string = b"My name is apple and I am a vegetable?"
 
# Initializing the sha256() method
sha256 = hashlib.sha256()
 
# Passing the byte stream as an argument
sha256.update(string)
 
# sha256.hexdigest() hashes all the input data
# passed to the sha256() via sha256.update()
# Acts as a finalize method, after which all
# the input data gets hashed
# hexdigest() hashes the data, and returns
# the output in hexadecimal format
string_hash = sha256.hexdigest()
 
 
print(f"Hash:{string_hash}")

                    

Output:

Hash:252f8ca07a6fcaae293e5097151c803a7f16504e48c4eb60f651c11341e83217

Don't miss your chance to ride the wave of the data revolution! Every industry is scaling new heights by tapping into the power of data. Sharpen your skills and become a part of the hottest trend in the 21st century.

Dive into the future of technology - explore the Complete Machine Learning and Data Science Program by GeeksforGeeks and stay ahead of the curve.


Last Updated : 06 Feb, 2023
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads
Complete Tutorials