Open In App

Finding Md5 of Files Recursively in Directory in Python

MD5 stands for Message Digest Algorithm 5, it is a cryptographic hash function that takes input(or message) of any length and produces its 128-bit(16-byte) hash value which is represented as a 32-character hexadecimal number. The MD5 of a file is the MD5 hash value computed from the content of that file. It's a unique representation of the file's content represented as a hexadecimal string.

For a more detailed understanding of the MD5 algorithm click here.

Finding Md5 of Files Recursively in Directory in Python

In this example, the content of the "GFG.txt" file will be converted to an MD5 Hash Value in Python.

GFG.txt File contains the following data:

Hello Geeks, This is a Text File

Finding Md5 of Files Recursively in Directory in Python Using hashlib Module

In this example, Python code calculates the MD5 hash of a file specified by the 'file_path' variable. It reads the file in binary mode, processes it in 4096-byte chunks, and updates the MD5 hasher accordingly. Finally, it returns the hexadecimal representation of the MD5 hash, which is then printed out.

import hashlib


def calculate_md5(file_path):
    hasher = hashlib.md5()
    with open(file_path, 'rb') as f:
        for chunk in iter(lambda: f.read(4096), b''):
            hasher.update(chunk)
    return hasher.hexdigest()


file_path = "GFG.txt"  # Path to your file
md5_hash = calculate_md5(file_path)
print("MD5 hash of the file:", md5_hash)

Output:

MD5 hash of the file: 2595ddffecba213cfa0e914f004b9fe0

Finding Md5 of Files Recursively in Directory in Python Using OS Module

In this example, below Python code uses the hashlib and OS modules to compute MD5 hashes of files. The md5() function calculates the MD5 hash of a file by reading it in 8192-byte chunks, while find_md5_hashes() traverses a directory recursively to find all files and their MD5 hashes. Finally, it prints out each file path along with its corresponding MD5 hash for files found in the specified directory.

import hashlib
import os

def md5(file_path):
    # Calculate the MD5 hash of a file.
    with open(file_path, 'rb') as f:
        file_hash = hashlib.md5()
        while chunk := f.read(8192):
            file_hash.update(chunk)
    return file_hash.hexdigest()

def find_md5_hashes(directory):
    # Find MD5 hashes of files recursively in a directory.
    hashes = {}
    for root, _, files in os.walk(directory):
        for file in files:
            file_path = os.path.join(root, file)
            hashes[file_path] = md5(file_path)
    return hashes

# Printing Hashes
directory_path = r"C:\Users\shrav\Desktop\GFG"
file_hashes = find_md5_hashes(directory_path)
for file_path, md5_hash in file_hashes.items():
    print(f'File: {file_path}, MD5: {md5_hash}')

Output:

File: C:\Users\shrav\Desktop\GFG\GFG.py, MD5: f29dc1c874af005f04f38dd17498e0b0
File: C:\Users\shrav\Desktop\GFG\GFG.txt, MD5: 2595ddffecba213cfa0e914f004b9fe0
File: C:\Users\shrav\Desktop\GFG\test1.txt, MD5: 29f96cae61a1ccbddd6c62114d1d3d72
File: C:\Users\shrav\Desktop\GFG\test2.txt, MD5: 1f932f76028fd53cfb2bebf3ad5b19db
File: C:\Users\shrav\Desktop\GFG\test3.txt, MD5: 5c01152040c21ac9bc4bb22bbc8e8233
File: C:\Users\shrav\Desktop\GFG\test4.txt, MD5: 4c38f706f01574461152ef75bd70a14a
Article Tags :