Birthday attack in Cryptography

Last Updated : 31 Aug, 2023

Prerequisite – Birthday paradox
Birthday attack is a type of cryptographic attack that belongs to a class of brute force attacks. It exploits the mathematics behind the birthday problem in probability theory. The success of this attack largely depends upon the higher likelihood of collisions found between random attack attempts and a fixed degree of permutations, as described in the birthday paradox problem.

Birthday paradox problem –
Let us consider the example of a classroom of 30 students and a teacher. The teacher wishes to find pairs of students that have the same birthday. Hence the teacher asks for everyone’s birthday to find such pairs. Intuitively this value may seem small. For example, if the teacher fixes a particular date say October 10, then the probability that at least one student is born on that day is 1 – (364/365)³⁰ which is about 7.9%. However, the probability that at least one student has the same birthday as any other student is around 70% using the following formula:

1 - 365!/((365 - n!) * (365ⁿ))  (substituting n = 30 here)

Derivation of the above term:

Assumptions –
1. Assuming a non leap year(hence 365 days).
2. Assuming that a person has an equally likely chance of being born on any day of the year.
Let us consider n = 2.
P(Two people have the same birthday) = 1 – P(Two people having different birthday)
                                                              = 1 – (365/365)*(364/365)
                                                              = 1 – 1*(364/365)
                                                              = 1 – 364/365
                                                              = 1/365.
So for n people, the probability that all of them have different birthdays is:
P(N people having different birthdays) = (365/365)*(365-1/365)*(365-2/365)*….(365-n+1)/365.
                                                              = 365!/((365-n)! * 365ⁿ)

Hash function –
A hash function H is a transformation that takes a variable sized input m and returns a fixed size string called a hash value(h = H(m)). Hash functions chosen in cryptography must satisfy the following requirements:

The input is of variable length,
The output has a fixed length,
H(x) is relatively easy to compute for any given x,
H(x) is one-way,
H(x) is collision-free.

A hash function H is said to be one-way if it is hard to invert, where “hard to invert” means that given a hash value h, it is computationally infeasible to find some input x such that H(x) = h.

If, given a message x, it is computationally infeasible to find a message y not equal to x such that H(x) = H(y) then H is said to be a weakly collision-free hash function.

A strongly collision-free hash function H is one for which it is computationally infeasible to find any two messages x and y such that H(x) = H(y).

Let H: M => {0, 1}ⁿ be a hash function (|M| >> 2ⁿ )

Following is a generic algorithm to find a collision in time O(2^n/2) hashes.

Algorithm:

Choose 2^n/2 random messages in M: m₁, m₂, …., m_n/2
For i = 1, 2, …, 2^n/2 compute t_i = H(m_i) => {0, 1}ⁿ
Look for a collision (t_i = t_j). If not found, go back to step 1

We consider the following experiment. From a set of H values, we choose n values uniformly at random thereby allowing repetitions. Let p(n; H) be the probability that during this experiment at least one value is chosen more than once. This probability can be approximated as:

p(n; H) = 1 - ( (365-1)/365) * (365-2)/365) * ...(365-n+1/365))
p(n; H) = e^-n(n-1)/(2H) = e^-n2/(2H)

Digital signature susceptibility –
Digital signatures can be susceptible to birthday attacks. A message m is typically signed by first computing H(m), where H is a cryptographic hash function, and then using some secret key to sign H(m). Suppose Alice wants to trick Bob into signing a fraudulent contract. Alice prepares a fair contract m and fraudulent one m’. She then finds a number of positions where m can be changed without changing the meaning, such as inserting commas, empty lines, one versus two spaces after a sentence, replacing synonyms, etc. By combining these changes she can create a huge number of variations on m which are all fair contracts.

Similarly, Alice can also make some of these changes on m’ to take it, even more, closer towards m, that is H(m) = H(m’). Hence, Alice can now present the fair version m to Bob for signing. After Bob has signed, Alice takes the signature and attaches to it the fraudulent contract. This signature proves that Bob has signed the fraudulent contract.

To avoid such an attack the output of the hash function should be a very long sequence of bits such that the birthday attack now becomes computationally infeasible.

Example :

Let’s demonstrate a Birthday Attack in Python using the MD5 hash function:

Python

import hashlib
import random
 
# Function to generate a random string of a given length
def generate_random_string(length):
    charset = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
    return ''.join(random.choice(charset) for _ in range(length))
 
# Function to perform the Birthday Attack
def birthday_attack():
    hash_dict = {}
    num_attempts = 0
 
    while True:
        num_attempts += 1
        random_string = generate_random_string(10)
        hash_value = hashlib.md5(random_string.encode()).hexdigest()
 
        if hash_value in hash_dict:
            print(f"Collision found after {num_attempts} attempts!")
            print(f"Original String 1: {hash_dict[hash_value]}")
            print(f"Original String 2: {random_string}")
            break
 
        hash_dict[hash_value] = random_string
 
# Example usage
if __name__ == "__main__":
    birthday_attack()

Explanation:

The ‘generate_random_string()' function generates a random string of a given length containing uppercase and lowercase letters, as well as digits.
The ‘birthday_attack()' function performs the actual attack. It keeps generating random strings, calculates their MD5 hash values using the ‘hashlib.md5()' function, and checks if the hash value is already present in the ‘hash_dict'. If a collision is found (two different inputs with the same hash), it prints the original strings and exits the loop.

Output :

Collision found after 10467 attempts!
Original String 1: 9lr9UUjklH
Original String 2: 9lr9UUjkT5

In this example, we simulated a Birthday Attack by generating random strings and calculating their MD5 hash values. After approximately 10467 attempts, we found a collision where two different input strings produced the same MD5 hash. This demonstrates the vulnerability of hash functions to Birthday Attacks, highlighting the importance of using secure and collision-resistant hash functions in cryptography. For this reason, it is generally recommended to use stronger hash functions like SHA-3 in practical applications.

Suggest improvement

Message Digest in Information security

IP security (IPSec)

Share your thoughts in the comments

Basics of Data Communication

OSI Model

Data and Signals

Transmission of Signals

Multiplexing

Transmission Media

Error Detection and Correction

Channelization

Network Security