Python Program to check for almost similar Strings

Last Updated : 02 May, 2023

Given two strings, the task here is to write a python program that can test if they are almost similar. Similarity of strings is being checked on the criteria of frequency difference of each character which should be greater than a threshold here represented by K.

Input : test_str1 = ‘aabcdaa’, test_str2 = “abbaccd”, K = 2
Output : True
Explanation : ‘a’ occurs 4 times in str1, and 2 times in str2, 4 – 2 = 2, in range, similarly, all chars in range, hence true.

Input : test_str1 = ‘aabcdaaa’, test_str2 = “abbaccda”, K = 3
Output : True
Explanation : ‘a’ occurs 5 times in str1, and 3 times in str2, 5 – 3 = 2, in range, similarly, all chars in range, hence true

Method 1 : Using ascii_lowecase, dictionary comprehension, loop and abs()

In this, we compute all the frequencies of all the characters in both strings using dictionary comprehension and loop. Next, each character is iterated from alphabetic lowercase ascii characters and tested for frequency difference in both strings using abs(), if any difference computes to greater than K, result is flagged off.

Example

Python3

from string import ascii_lowercase
 
# function to compute frequencies
 
 
def get_freq(test_str):
 
    # starting at 0 count
    freqs = {char: 0 for char in ascii_lowercase}
 
    # counting frequencies
    for char in test_str:
        freqs[char] += 1
    return freqs
 
 
# initializing strings
test_str1 = 'aabcdaa'
test_str2 = "abbaccd"
 
# printing original strings
print("The original string 1 is : " + str(test_str1))
print("The original string 2 is : " + str(test_str2))
 
# initializing K
K = 2
 
# getting frequencies
freqs_1 = get_freq(test_str1)
freqs_2 = get_freq(test_str2)
 
# checking for frequencies
res = True
for char in ascii_lowercase:
    if abs(freqs_1[char] - freqs_2[char]) > K:
        res = False
        break
 
# printing result
print("Are strings similar ? : " + str(res))

Output:

The original string 1 is : aabcdaa

The original string 2 is : abbaccd

Are strings similar ? : True

Method 2 : Using Counter() and max()

In this, we perform task of getting individual characters’ frequency using Counter() and get the maximum difference using max(), if greater than K, then result is flagged off.

Example:

Python3

from collections import Counter
 
# initializing strings
test_str1 = 'aabcdaa'
test_str2 = "abbaccd"
 
# printing original strings
print("The original string 1 is : " + str(test_str1))
print("The original string 2 is : " + str(test_str2))
 
# initializing K
K = 2
 
# extracting frequencies
cnt1 = Counter(test_str1.lower())
cnt2 = Counter(test_str2.lower())
 
# getting maximum difference
res = True
if max((cnt1 - cnt2).values()) > K or max((cnt2 - cnt1).values()) > K:
    res = False
 
# printing result
print("Are strings similar ? : " + str(res))

Output:

The original string 1 is : aabcdaa

The original string 2 is : abbaccd

Are strings similar ? : True

Time Complexity: O(n)
Auxiliary Space: O(n)

Method 3 : Using list comprehension:

Approach:

In this program, we define a function is_almost_similar_using_list_comprehension that takes three parameters: test_str1, test_str2, and k.

First, we create two lists counter1 and counter2 using list comprehension, which count the occurrences of each character in the respective strings.
Next, we calculate the absolute difference between the counts of each character in both lists using a for loop and sum() function.
Finally, we check if the total difference is less than or equal to k and return True or False accordingly.

Python3

def is_almost_similar_using_list_comprehension(test_str1, test_str2, k):
    counter1 = [test_str1.count(char) for char in set(test_str1)]
    counter2 = [test_str2.count(char) for char in set(test_str2)]
    diff = sum(abs(counter1[i] - counter2[i]) for i in range(len(counter1)))
    return diff >= k
 
# Testing
test_str1 = 'aabcdaaa'
test_str2 = 'abbaccda'
k = 3
print(f"Input strings: '{test_str1}', '{test_str2}'")
print(f"Value of K: {k}")
print(f"Output: {is_almost_similar_using_list_comprehension(test_str1, test_str2, k)}")

Output

Input strings: 'aabcdaaa', 'abbaccda'
Value of K: 3
Output: True

The time complexity of the is_almost_similar_using_list_comprehension function is O(n), where n is the length of the longer string between test_str1 and test_str2.
The space complexity of the function is also O(n), since we are using two lists counter1 and counter2 to store the character counts of the respective strings.

Method 3: Using set() and count()

Create a set of unique characters in the first string using set(test_str1).
For each unique character in the set, count its frequency in both strings using count() method.
If the absolute difference in frequencies is greater than K for any character, then the strings are not similar.

Python3

# function to compute frequencies
def are_strings_similar(test_str1, test_str2, K):
    # getting unique characters in test_str1
    unique_chars = set(test_str1)
 
    # checking for frequencies
    for char in unique_chars:
        freq1 = test_str1.count(char)
        freq2 = test_str2.count(char)
        if abs(freq1 - freq2) > K:
            return False
    return True
 
# initializing strings
test_str1 = 'aabcdaa'
test_str2 = "abbaccd"
 
# printing original strings
print("The original string 1 is : " + str(test_str1))
print("The original string 2 is : " + str(test_str2))
 
# initializing K
K = 2
 
# checking if strings are similar
res = are_strings_similar(test_str1, test_str2, K)
 
# printing result
print("Are strings similar ? : " + str(res))

Output

The original string 1 is : aabcdaa
The original string 2 is : abbaccd
Are strings similar ? : True

Time Complexity: O(n^2), due to the time complexity of the are_strings_similar() function.
Auxiliary Space: O(k), where k is the number of unique characters in test_str1.

Suggest improvement

Python program to reverse alternate characters in a string

Python - Find union of multiple sets

Share your thoughts in the comments