Open In App

Verify Integrity of Files Using Digest in Python

Last Updated : 15 Mar, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

Data integrity is a critical aspect of file management, ensuring that files remain unaltered during transmission or storage. In Python, one effective method to verify file integrity is by using cryptographic hash functions and their corresponding digests. A digest is a fixed-size string generated by a hash function, uniquely representing the content of a file. In this article, we’ll explore how to verify the integrity of files using digests in Python through a step-by-step guide.

What is Digest in Python?

In Python, a digest is the result of applying a hash function (such as SHA-256 or MD5) to the content of a file. This fixed-size string serves as a unique identifier for the file’s content. If the file content changes, even by a single byte, the digest will change, providing a reliable way to detect alterations.

How To Verify Integrity Of Files Using Digest In Python?

Below, are the step-by-step guide on How To Verify the Integrity Of Files Using Digest In Python:

Install Colorama Library

To incorporate the Colorama library, which is not included in the default Python installation, execute the following command to install it:

pip install colorama

d1

Step 1: Library Imports

In below code, the required libraries are imported. The argparse library is used for parsing command-line arguments, hashlib for cryptographic hash functions, and sys for system-specific parameters and functions.

Python3
# Import the necessary libraries required.
import argparse
import hashlib
import sys
# Import the functions init and Fore from the colorama library.
from colorama import init, Fore

Step 2: Hash Calculation Function

Below, code defines a function calculate_hash that takes a file path as an argument and calculates the SHA-256 hash of the file using a hash object. It reads the file in 64KB chunks for efficiency and updates the hash object accordingly.

Python3
# Define a function to calculate the SHA-256 hash of a file.
def calculate_hash(file_path):
    # Create a SHA-256 hash object.
    sha256_hash = hashlib.sha256()
    # Open the file in binary mode for reading (rb).
    with open(file_path, "rb") as file:
        # Read the file in 64KB chunks to efficiently handle large files.
        while True:
            data = file.read(65536)  # Read the file in 64KB chunks.
            if not data:
                break
            # Update the hash object with the data read from the file.
            sha256_hash.update(data)
    return sha256_hash.hexdigest()

Step 3: Hash Verification Function

Here, a function verify_hash is defined, which takes a downloaded file path and an expected hash value as arguments. It calculates the hash of the downloaded file using the calculate_hash function and compares it with the expected hash value, returning a boolean result.

Python3
def verify_hash(downloaded_file, expected_hash):
    calculated_hash = calculate_hash(downloaded_file)
    return calculated_hash == expected_hash

Step 4: Command-Line Argument

In this code, a command-line argument parser is created using argparse. Two arguments are defined: -f or --file for the downloaded file path and --hash for the expected hash value. Both are marked as required.

Python3
parser = argparse.ArgumentParser(
    description="Verify the hash of a file that is downloaded.")
parser.add_argument("-f", "--file", dest="downloaded_file",
                    required=True, help="path for the file downloaded")
parser.add_argument("--hash", dest="expected_hash",
                    required=True, help="Expected hash value is")
args = parser.parse_args()

Step 5: Argument Validation and Hash Verification

Finally, this subpart checks if the required command-line arguments are provided. If not, it prints an error message in red and exits. If the arguments are present, it proceeds to verify the hash using the verify_hash function. Depending on the result, it prints a success or failure message in green or red, respectively.

Python3
if not args.downloaded_file or not args.expected_hash:
    print(
        f"{Fore.RED}[-] Please Specify the file in order to validate and its Hash.")
    sys.exit()
if verify_hash(args.downloaded_file, args.expected_hash):
    print(
        f"{Fore.GREEN}[+] Hash verification occurred successfully. The software is original.")
else:
    print(
        f"{Fore.RED}[-] Hash verification has failed, which means the software may have been tampered or is not original.")

Step 6: Run the Command in Terminal

After you have successfully written the code I have mentioned above, you can simply open the command prompt and go to the directory to where you have saved the python program and begin execution, for the execution you will need to run the following command:

python verify.py -f [file path here] [file name with extension] --hash [input your hash here.]

Complete Code

This code initiates Colorama for colored text, defines functions to calculate and verify SHA-256 hash of a file, and utilizes argparse for command-line argument parsing to check the integrity of a downloaded file by comparing its hash with an expected value, printing success or failure messages accordingly. The script ensures proper validation of command-line arguments and outputs informative messages about hash verification results.

Python3
import argparse
import hashlib
import sys
from colorama import init, Fore

# Initialize colorama for colored terminal text.
init()

# Define a function to calculate SHA-256 hash of a file.


def calculate_hash(file_path):
    sha256_hash = hashlib.sha256()
    with open(file_path, "rb") as file:
        while (data: = file.read(65536)):
            sha256_hash.update(data)
    return sha256_hash.hexdigest()

# Function to verify hash of a downloaded file.


def verify_hash(downloaded_file, expected_hash):
    calculated_hash = calculate_hash(downloaded_file)
    return calculated_hash == expected_hash


# Command-line argument parsing.
parser = argparse.ArgumentParser(description="Verify downloaded file's hash.")
parser.add_argument("-f", "--file", dest="downloaded_file",
                    required=True, help="Path of the downloaded file")
parser.add_argument("--hash", dest="expected_hash",
                    required=True, help="Expected hash value")
args = parser.parse_args()

# Validate arguments and perform hash verification.
if not args.downloaded_file or not args.expected_hash:
    print(
        f"{Fore.RED}[-] Please specify the file and its hash for validation.")
    sys.exit()

if verify_hash(args.downloaded_file, args.expected_hash):
    print(
        f"{Fore.GREEN}[+] Hash verification successful. The file is original.")
else:
    print(
        f"{Fore.RED}[-] Hash verification failed. This may indicate tampering or non-original software.")

Output:

C:\Users\kisha\PycharmProjects\gfg-integrity>python verify.py -f C:\Users\kisha\Downloads\Programs\vlc-3.0.20-win64.exe --hash d8055b6643651ca5b9ad58c438692a481483657f3f31624cdfa68b92e8394a57
[+] Hash verification occured successfuly. The software is original.

Command Prompt Verifcation

Conclusion

In conclusion, we learnt some concepts regarding the data integrity and how it can affect the data or file of any company or organization. we also learn about some of the drawbacks that we may have if we implement a system to deal and verify the data integrity, apart from all that we learnt the most important concept which was learning how we can verify the integrity of the files using digest or hash methods such as MD5, SHA-256 etc. along which example code to verify the integrity of a file.



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads