Verify Integrity of Files Using Digest in Python
Last Updated :
15 Mar, 2024
Data integrity is a critical aspect of file management, ensuring that files remain unaltered during transmission or storage. In Python, one effective method to verify file integrity is by using cryptographic hash functions and their corresponding digests. A digest is a fixed-size string generated by a hash function, uniquely representing the content of a file. In this article, we’ll explore how to verify the integrity of files using digests in Python through a step-by-step guide.
What is Digest in Python?
In Python, a digest is the result of applying a hash function (such as SHA-256 or MD5) to the content of a file. This fixed-size string serves as a unique identifier for the file’s content. If the file content changes, even by a single byte, the digest will change, providing a reliable way to detect alterations.
How To Verify Integrity Of Files Using Digest In Python?
Below, are the step-by-step guide on How To Verify the Integrity Of Files Using Digest In Python:
Install Colorama Library
To incorporate the Colorama library, which is not included in the default Python installation, execute the following command to install it:
pip install colorama
Step 1: Library Imports
In below code, the required libraries are imported. The argparse
library is used for parsing command-line arguments, hashlib
for cryptographic hash functions, and sys
for system-specific parameters and functions.
Python3
# Import the necessary libraries required.
import argparse
import hashlib
import sys
# Import the functions init and Fore from the colorama library.
from colorama import init, Fore
Step 2: Hash Calculation Function
Below, code defines a function calculate_hash
that takes a file path as an argument and calculates the SHA-256 hash of the file using a hash object. It reads the file in 64KB chunks for efficiency and updates the hash object accordingly.
Python3
# Define a function to calculate the SHA-256 hash of a file.
def calculate_hash(file_path):
# Create a SHA-256 hash object.
sha256_hash = hashlib.sha256()
# Open the file in binary mode for reading (rb).
with open(file_path, "rb") as file:
# Read the file in 64KB chunks to efficiently handle large files.
while True:
data = file.read(65536) # Read the file in 64KB chunks.
if not data:
break
# Update the hash object with the data read from the file.
sha256_hash.update(data)
return sha256_hash.hexdigest()
Step 3: Hash Verification Function
Here, a function verify_hash
is defined, which takes a downloaded file path and an expected hash value as arguments. It calculates the hash of the downloaded file using the calculate_hash
function and compares it with the expected hash value, returning a boolean result.
Python3
def verify_hash(downloaded_file, expected_hash):
calculated_hash = calculate_hash(downloaded_file)
return calculated_hash == expected_hash
Step 4: Command-Line Argument
In this code, a command-line argument parser is created using argparse
. Two arguments are defined: -f
or --file
for the downloaded file path and --hash
for the expected hash value. Both are marked as required.
Python3
parser = argparse.ArgumentParser(
description="Verify the hash of a file that is downloaded.")
parser.add_argument("-f", "--file", dest="downloaded_file",
required=True, help="path for the file downloaded")
parser.add_argument("--hash", dest="expected_hash",
required=True, help="Expected hash value is")
args = parser.parse_args()
Step 5: Argument Validation and Hash Verification
Finally, this subpart checks if the required command-line arguments are provided. If not, it prints an error message in red and exits. If the arguments are present, it proceeds to verify the hash using the verify_hash
function. Depending on the result, it prints a success or failure message in green or red, respectively.
Python3
if not args.downloaded_file or not args.expected_hash:
print(
f"{Fore.RED}[-] Please Specify the file in order to validate and its Hash.")
sys.exit()
if verify_hash(args.downloaded_file, args.expected_hash):
print(
f"{Fore.GREEN}[+] Hash verification occurred successfully. The software is original.")
else:
print(
f"{Fore.RED}[-] Hash verification has failed, which means the software may have been tampered or is not original.")
Step 6: Run the Command in Terminal
After you have successfully written the code I have mentioned above, you can simply open the command prompt and go to the directory to where you have saved the python program and begin execution, for the execution you will need to run the following command:
python verify.py -f [file path here] [file name with extension] --hash [input your hash here.]
Complete Code
This code initiates Colorama for colored text, defines functions to calculate and verify SHA-256 hash of a file, and utilizes argparse for command-line argument parsing to check the integrity of a downloaded file by comparing its hash with an expected value, printing success or failure messages accordingly. The script ensures proper validation of command-line arguments and outputs informative messages about hash verification results.
Python3
import argparse
import hashlib
import sys
from colorama import init, Fore
# Initialize colorama for colored terminal text.
init()
# Define a function to calculate SHA-256 hash of a file.
def calculate_hash(file_path):
sha256_hash = hashlib.sha256()
with open(file_path, "rb") as file:
while (data: = file.read(65536)):
sha256_hash.update(data)
return sha256_hash.hexdigest()
# Function to verify hash of a downloaded file.
def verify_hash(downloaded_file, expected_hash):
calculated_hash = calculate_hash(downloaded_file)
return calculated_hash == expected_hash
# Command-line argument parsing.
parser = argparse.ArgumentParser(description="Verify downloaded file's hash.")
parser.add_argument("-f", "--file", dest="downloaded_file",
required=True, help="Path of the downloaded file")
parser.add_argument("--hash", dest="expected_hash",
required=True, help="Expected hash value")
args = parser.parse_args()
# Validate arguments and perform hash verification.
if not args.downloaded_file or not args.expected_hash:
print(
f"{Fore.RED}[-] Please specify the file and its hash for validation.")
sys.exit()
if verify_hash(args.downloaded_file, args.expected_hash):
print(
f"{Fore.GREEN}[+] Hash verification successful. The file is original.")
else:
print(
f"{Fore.RED}[-] Hash verification failed. This may indicate tampering or non-original software.")
Output:
C:\Users\kisha\PycharmProjects\gfg-integrity>python verify.py -f C:\Users\kisha\Downloads\Programs\vlc-3.0.20-win64.exe --hash d8055b6643651ca5b9ad58c438692a481483657f3f31624cdfa68b92e8394a57
[+] Hash verification occured successfuly. The software is original.
Command Prompt Verifcation
Conclusion
In conclusion, we learnt some concepts regarding the data integrity and how it can affect the data or file of any company or organization. we also learn about some of the drawbacks that we may have if we implement a system to deal and verify the data integrity, apart from all that we learnt the most important concept which was learning how we can verify the integrity of the files using digest or hash methods such as MD5, SHA-256 etc. along which example code to verify the integrity of a file.
Share your thoughts in the comments
Please Login to comment...