Levenshtein distance is a measure of the similarity between two strings, which takes into account the number of insertion, deletion and substitution operations needed to transform one string into the other.
Operations in Levenshtein distance are:
- Insertion: Adding a character to string A.
- Deletion: Removing a character from string A.
- Replacement: Replacing a character in string A with another character.
Let’s see an example that there is String A: “kitten” which need to be converted in String B: “sitting” so we need to determine the minimum operation required
- kitten → sitten (substitution of “s” for “k”)
- sitten → sittin (substitution of “i” for ????”)
- sittin → sitting (insertion of “g” at the end).
In this case it took three operation do this, so the levenshtein distance will be 3.
- Upper and lower bounds: If and only if the two strings are identical, the Levenshtein distance is always non-negative and zero. Because it requires completely changing one string into the other through deletions or insertions, the most feasible Levenshtein distance between two strings of length m and n is max(m, n).
Applications of Levenshtein distance:
The Levenshtein distance has various applications in various fields such as:
- Autocorrect Algorithms: Text editors and messaging applications use the Levenshtein distance in their autocorrect features such as gboard, swift keyboard, etc.
- Data cleaning: It is widely used in the process of data cleaning and normalization task to reduce redundancy and identify similar records in the data mining process.
- Data clustering and classification: To identify similar records and cluster them is clustering while identifying similar records and providing them with class labels is classification
Relationship with other edit distance metrics:
Let’s see how Levenshtein distance is different from other distance metrics
- Damerau-Levenshtein distance: It is similar to the Levenshtein distance, but it just also allows transpositions as an additional operation making it 4 operations.
- Hamming distance: It can only be applied to strings of equal length, it is used measures the number of positions at which the corresponding characters are different.
Now let’s see its implementation using different approaches in different approaches:
1) Levenshtein distance using a recursive approach
To calculate the Levenshtein distance, In the recursive technique, we will use a simple recursive function. It checks each character in the two strings and performs recursive insertions, removals, and replacements.
Below is the implementation for the above idea:
// C++ code for the above approach: #include <bits/stdc++.h> using namespace std;
int levenshteinRecursive( const string& str1,
const string& str2, int m, int n)
{ // str1 is empty
if (m == 0) {
return n;
}
// str2 is empty
if (n == 0) {
return m;
}
if (str1[m - 1] == str2[n - 1]) {
return levenshteinRecursive(str1, str2, m - 1,
n - 1);
}
return 1
+ min(
// Insert
levenshteinRecursive(str1, str2, m, n - 1),
min(
// Remove
levenshteinRecursive(str1, str2, m - 1,
n),
// Replace
levenshteinRecursive(str1, str2, m - 1,
n - 1)));
} // Drivers code int main()
{ string str1 = "kitten" ;
string str2 = "sitting" ;
// Function Call
int distance = levenshteinRecursive(
str1, str2, str1.length(), str2.length());
cout << "Levenshtein Distance: " << distance << endl;
return 0;
} |
import java.io.*;
public class Solution {
public static int levenshteinRecursive(String str1,
String str2, int m, int n) {
// str1 is empty
if (m == 0 ) {
return n;
}
// str2 is empty
if (n == 0 ) {
return m;
}
if (str1.charAt(m - 1 ) == str2.charAt(n - 1 )) {
return levenshteinRecursive(str1, str2, m - 1 , n - 1 );
}
return 1 + Math.min(
// Insert
levenshteinRecursive(str1, str2, m, n - 1 ),
Math.min(
// Remove
levenshteinRecursive(str1, str2, m - 1 , n),
// Replace
levenshteinRecursive(str1, str2, m - 1 , n - 1 )
)
);
}
public static void main(String[] args) {
String str1 = "kitten" ;
String str2 = "sitting" ;
int distance = levenshteinRecursive(str1, str2, str1.length(), str2.length());
System.out.println( "Levenshtein Distance: " + distance);
}
} |
def levenshteinRecursive(str1, str2, m, n):
# str1 is empty
if m = = 0 :
return n
# str2 is empty
if n = = 0 :
return m
if str1[m - 1 ] = = str2[n - 1 ]:
return levenshteinRecursive(str1, str2, m - 1 , n - 1 )
return 1 + min (
# Insert
levenshteinRecursive(str1, str2, m, n - 1 ),
min (
# Remove
levenshteinRecursive(str1, str2, m - 1 , n),
# Replace
levenshteinRecursive(str1, str2, m - 1 , n - 1 ))
)
# Drivers code str1 = "kitten"
str2 = "sitting"
distance = levenshteinRecursive(str1, str2, len (str1), len (str2))
print ( "Levenshtein Distance:" , distance)
|
using System;
class Program
{ // Recursive function to calculate Levenshtein Distance
static int LevenshteinRecursive( string str1, string str2, int m, int n)
{
// If str1 is empty, the distance is the length of str2
if (m == 0)
{
return n;
}
// If str2 is empty, the distance is the length of str1
if (n == 0)
{
return m;
}
// If the last characters of the strings are the same
if (str1[m - 1] == str2[n - 1])
{
return LevenshteinRecursive(str1, str2, m - 1, n - 1);
}
// Calculate the minimum of three operations:
// Insert, Remove, and Replace
return 1 + Math.Min(
Math.Min(
// Insert
LevenshteinRecursive(str1, str2, m, n - 1),
// Remove
LevenshteinRecursive(str1, str2, m - 1, n)
),
// Replace
LevenshteinRecursive(str1, str2, m - 1, n - 1)
);
}
static void Main()
{
string str1 = "kitten" ;
string str2 = "sitting" ;
// Function Call
int distance = LevenshteinRecursive(str1, str2, str1.Length, str2.Length);
Console.WriteLine( "Levenshtein Distance: " + distance);
}
} |
// JavaScript code for the above approach: function levenshteinRecursive(str1, str2, m, n) {
// Base case: str1 is empty
if (m === 0) {
return n;
}
// Base case: str2 is empty
if (n === 0) {
return m;
}
// If the last characters of both
// strings are the same
if (str1[m - 1] === str2[n - 1]) {
return levenshteinRecursive(str1, str2, m - 1, n - 1);
}
// Calculate the minimum of three possible
// operations (insert, remove, replace)
return 1 + Math.min(
// Insert
levenshteinRecursive(str1, str2, m, n - 1),
// Remove
levenshteinRecursive(str1, str2, m - 1, n),
// Replace
levenshteinRecursive(str1, str2, m - 1, n - 1)
);
} // Driver code const str1 = "kitten" ;
const str2 = "sitting" ;
// Function Call const distance = levenshteinRecursive(str1, str2, str1.length, str2.length); console.log( "Levenshtein Distance: " + distance);
|
Levenshtein Distance: 3
Time complexity: O(3^(m+n))
Auxiliary complexity: O(m+n)
2) Levenshtein distance using Iterative with the full matrix approach
The iterative technique with a full matrix uses a 2D matrix to hold the intermediate results of the Levenshtein distance calculation. It begins with empty strings and iteratively fills the matrix row by row. It computes the minimum cost of insertions, deletions, and replacements based on the characters of both strings.
Below is the implementation for the above idea:
// C++ code for the above approach: #include <bits/stdc++.h> using namespace std;
int levenshteinFullMatrix( const string& str1,
const string& str2)
{ int m = str1.length();
int n = str2.length();
vector<vector< int > > dp(m + 1, vector< int >(n + 1, 0));
for ( int i = 0; i <= m; i++) {
dp[i][0] = i;
}
for ( int j = 0; j <= n; j++) {
dp[0][j] = j;
}
for ( int i = 1; i <= m; i++) {
for ( int j = 1; j <= n; j++) {
if (str1[i - 1] == str2[j - 1]) {
dp[i][j] = dp[i - 1][j - 1];
}
else {
dp[i][j] = 1
+ min(
// Insert
dp[i][j - 1],
min(
// Remove
dp[i - 1][j],
// Replace
dp[i - 1][j - 1]));
}
}
}
return dp[m][n];
} // Drivers code int main()
{ string str1 = "kitten" ;
string str2 = "sitting" ;
// Function Call
int distance = levenshteinFullMatrix(str1, str2);
cout << "Levenshtein Distance: " << distance << endl;
return 0;
} |
import java.util.Arrays;
public class LevenshteinDistance {
// Function to calculate Levenshtein Distance between two strings
public static int levenshteinFullMatrix(String str1, String str2) {
int m = str1.length();
int n = str2.length();
// Create a 2D array to store the dynamic programming results
int [][] dp = new int [m + 1 ][n + 1 ];
// Initialize the base cases
for ( int i = 0 ; i <= m; i++) {
dp[i][ 0 ] = i;
}
for ( int j = 0 ; j <= n; j++) {
dp[ 0 ][j] = j;
}
// Fill in the DP array using the recurrence relation
for ( int i = 1 ; i <= m; i++) {
for ( int j = 1 ; j <= n; j++) {
if (str1.charAt(i - 1 ) == str2.charAt(j - 1 )) {
// Characters match, no operation needed
dp[i][j] = dp[i - 1 ][j - 1 ];
} else {
// Characters don't match, consider the minimum of insert, remove, and replace
dp[i][j] = 1 + Math.min(
// Insert
dp[i][j - 1 ],
Math.min(
// Remove
dp[i - 1 ][j],
// Replace
dp[i - 1 ][j - 1 ]));
}
}
}
// Result is stored in the bottom-right cell of the DP array
return dp[m][n];
}
public static void main(String[] args) {
String str1 = "kitten" ;
String str2 = "sitting" ;
// Function Call
int distance = levenshteinFullMatrix(str1, str2);
// Print the result
System.out.println( "Levenshtein Distance: " + distance);
}
} |
def levenshteinFullMatrix(str1, str2):
m = len (str1)
n = len (str2)
# Initialize a matrix to store the edit distances
dp = [[ 0 for _ in range (n + 1 )] for _ in range (m + 1 )]
# Initialize the first row and column with values from 0 to m and 0 to n respectively
for i in range (m + 1 ):
dp[i][ 0 ] = i
for j in range (n + 1 ):
dp[ 0 ][j] = j
# Fill the matrix using dynamic programming to compute edit distances
for i in range ( 1 , m + 1 ):
for j in range ( 1 , n + 1 ):
if str1[i - 1 ] = = str2[j - 1 ]:
# Characters match, no operation needed
dp[i][j] = dp[i - 1 ][j - 1 ]
else :
# Characters don't match, choose minimum cost among insertion, deletion, or substitution
dp[i][j] = 1 + min (dp[i][j - 1 ], dp[i - 1 ][j], dp[i - 1 ][j - 1 ])
# Return the edit distance between the strings
return dp[m][n]
# Driver code str1 = "kitten"
str2 = "sitting"
# Function Call distance = levenshteinFullMatrix(str1, str2)
print (f "Levenshtein Distance: {distance}" )
|
using System;
class LevenshteinDistance
{ // Function to calculate Levenshtein Distance using a full matrix approach
static int LevenshteinFullMatrix( string str1, string str2)
{
int m = str1.Length;
int n = str2.Length;
// Create a matrix to store distances
int [,] dp = new int [m + 1, n + 1];
// Initialize the first row and column of the matrix
for ( int i = 0; i <= m; i++)
{
dp[i, 0] = i; // Number of insertions required for str1 to become an empty string
}
for ( int j = 0; j <= n; j++)
{
dp[0, j] = j; // Number of insertions required for an empty string to become str2
}
// Fill in the matrix with minimum edit distances
for ( int i = 1; i <= m; i++)
{
for ( int j = 1; j <= n; j++)
{
if (str1[i - 1] == str2[j - 1])
{
dp[i, j] = dp[i - 1, j - 1]; // Characters match, no operation needed
}
else
{
// Choose the minimum of insert, delete, or replace operations
dp[i, j] = 1 + Math.Min(
dp[i, j - 1], // Insertion
Math.Min(
dp[i - 1, j], // Deletion
dp[i - 1, j - 1] // Replacement
)
);
}
}
}
return dp[m, n]; // Return the final edit distance
}
static void Main()
{
string str1 = "kitten" ;
string str2 = "sitting" ;
// Calculate Levenshtein Distance between str1 and str2
int distance = LevenshteinFullMatrix(str1, str2);
Console.WriteLine( "Levenshtein Distance: " + distance);
}
} |
// JavaScript code for the above approach: function levenshteinFullMatrix(str1, str2) {
const m = str1.length;
const n = str2.length;
const dp = new Array(m + 1).fill( null ).map(() => new Array(n + 1).fill(0));
// Initialize the first row
// and column of the matrix
for (let i = 0; i <= m; i++) {
dp[i][0] = i;
}
for (let j = 0; j <= n; j++) {
dp[0][j] = j;
}
for (let i = 1; i <= m; i++) {
for (let j = 1; j <= n; j++) {
if (str1[i - 1] === str2[j - 1]) {
dp[i][j] = dp[i - 1][j - 1];
} else {
dp[i][j] = 1 + Math.min(
// Insert
dp[i][j - 1],
Math.min(
// Remove
dp[i - 1][j],
// Replace
dp[i - 1][j - 1]
)
);
}
}
}
return dp[m][n];
} // Driver code const str1 = "kitten" ;
const str2 = "sitting" ;
// Function Call const distance = levenshteinFullMatrix(str1, str2); console.log( "Levenshtein Distance:" , distance);
|
Levenshtein Distance: 3
Time complexity: O(m*n)
Auxiliary complexity: O(m*n)
3) Levenshtein distance using Iterative with two matrix rows approach
By simply storing two rows of the matrix at a time, the iterative technique with two matrix rows reduces space complexity. It iterates through the strings row by row, storing the current and past calculations in two rows.
Below is the implementation for the above approach:
// C++ code for the above approach: #include <bits/stdc++.h> using namespace std;
int levenshteinTwoMatrixRows( const string& str1,
const string& str2)
{ int m = str1.length();
int n = str2.length();
vector< int > prevRow(n + 1, 0);
vector< int > currRow(n + 1, 0);
for ( int j = 0; j <= n; j++) {
prevRow[j] = j;
}
for ( int i = 1; i <= m; i++) {
currRow[0] = i;
for ( int j = 1; j <= n; j++) {
if (str1[i - 1] == str2[j - 1]) {
currRow[j] = prevRow[j - 1];
}
else {
currRow[j] = 1
+ min(
// Insert
currRow[j - 1],
min(
// Remove
prevRow[j],
// Replace
prevRow[j - 1]));
}
}
prevRow = currRow;
}
return currRow[n];
} // Drivers code int main()
{ string str1 = "kitten" ;
string str2 = "sitting" ;
// Function Call
int distance = levenshteinTwoMatrixRows(str1, str2);
cout << "Levenshtein Distance: " << distance;
return 0;
} |
import java.util.Arrays;
public class LevenshteinDistance {
// Method to calculate Levenshtein distance using two matrix rows
public static int levenshteinTwoMatrixRows(String str1, String str2) {
int m = str1.length();
int n = str2.length();
// Initializing two arrays to store the current and previous row values
int [] prevRow = new int [n + 1 ];
int [] currRow = new int [n + 1 ];
// Initializing the first row with increasing integers
for ( int j = 0 ; j <= n; j++) {
prevRow[j] = j;
}
// Looping through each character of str1
for ( int i = 1 ; i <= m; i++) {
// Initializing the first element of the current row with the row number
currRow[ 0 ] = i;
// Looping through each character of str2
for ( int j = 1 ; j <= n; j++) {
// If characters are equal, no operation needed, take the diagonal value
if (str1.charAt(i - 1 ) == str2.charAt(j - 1 )) {
currRow[j] = prevRow[j - 1 ];
} else {
// If characters are not equal, find the minimum value of insert, delete, or replace
currRow[j] = 1 + Math.min(currRow[j - 1 ], Math.min(prevRow[j], prevRow[j - 1 ]));
}
}
// Update prevRow with currRow values
prevRow = Arrays.copyOf(currRow, currRow.length);
}
// Return the final Levenshtein distance stored at the bottom-right corner of the matrix
return currRow[n];
}
// Main method for testing
public static void main(String[] args) {
String str1 = "kitten" ;
String str2 = "sitting" ;
// Function Call
int distance = levenshteinTwoMatrixRows(str1, str2);
System.out.println( "Levenshtein Distance: " + distance);
}
} |
# Python program for the above approach def levenshtein_two_matrix_rows(str1, str2):
# Get the lengths of the input strings
m = len (str1)
n = len (str2)
# Initialize two rows for dynamic programming
prev_row = [j for j in range (n + 1 )]
curr_row = [ 0 ] * (n + 1 )
# Dynamic programming to fill the matrix
for i in range ( 1 , m + 1 ):
# Initialize the first element of the current row
curr_row[ 0 ] = i
for j in range ( 1 , n + 1 ):
if str1[i - 1 ] = = str2[j - 1 ]:
# Characters match, no operation needed
curr_row[j] = prev_row[j - 1 ]
else :
# Choose the minimum cost operation
curr_row[j] = 1 + min (
curr_row[j - 1 ], # Insert
prev_row[j], # Remove
prev_row[j - 1 ] # Replace
)
# Update the previous row with the current row
prev_row = curr_row.copy()
# The final element in the last row contains the Levenshtein distance
return curr_row[n]
# Driver code if __name__ = = "__main__" :
# Example input strings
str1 = "kitten"
str2 = "sitting"
# Function call to calculate Levenshtein distance
distance = levenshtein_two_matrix_rows(str1, str2)
# Print the result
print ( "Levenshtein Distance:" , distance)
# This code is contributed by Susobhan Akhuli |
// C# program for the above approach using System;
class LevenshteinDistance {
// Function to calculate Levenshtein distance between
// two strings
static int LevenshteinTwoMatrixRows( string str1,
string str2)
{
int m = str1.Length;
int n = str2.Length;
// Initialize two rows for dynamic programming
int [] prevRow = new int [n + 1];
int [] currRow = new int [n + 1];
// Initialization of the first row
for ( int j = 0; j <= n; j++) {
prevRow[j] = j;
}
// Dynamic programming to calculate Levenshtein
// distance
for ( int i = 1; i <= m; i++) {
// Initialize the current row with the value of
// i
currRow[0] = i;
for ( int j = 1; j <= n; j++) {
// If characters are the same, no operation
// is needed
if (str1[i - 1] == str2[j - 1]) {
currRow[j] = prevRow[j - 1];
}
else {
// Choose the minimum of three
// operations: insert, remove, or
// replace
currRow[j] = 1
+ Math.Min(
// Insert
currRow[j - 1],
Math.Min(
// Remove
prevRow[j],
// Replace
prevRow[j - 1]));
}
}
// Update the previous row for the next
// iteration
Array.Copy(currRow, prevRow, n + 1);
}
// The bottom-right cell contains the Levenshtein
// distance
return currRow[n];
}
// Main method to test the Levenshtein distance
// calculation
static void Main()
{
string str1 = "kitten" ;
string str2 = "sitting" ;
// Function Call
int distance = LevenshteinTwoMatrixRows(str1, str2);
Console.WriteLine( "Levenshtein Distance: "
+ distance);
}
} // This code is contributed by Susobhan Akhuli |
// Function to calculate Levenshtein distance using two matrix rows function levenshteinTwoMatrixRows(str1, str2) {
const m = str1.length;
const n = str2.length;
// Initialize two arrays to represent the matrix rows
let prevRow = new Array(n + 1).fill(0);
let currRow = new Array(n + 1).fill(0);
// Initialize the first row with consecutive numbers
for (let j = 0; j <= n; j++) {
prevRow[j] = j;
}
// Dynamic programming to fill the matrix
for (let i = 1; i <= m; i++) {
currRow[0] = i;
for (let j = 1; j <= n; j++) {
// Check if characters at the current positions are equal
if (str1[i - 1] === str2[j - 1]) {
currRow[j] = prevRow[j - 1]; // No operation required
} else {
// Choose the minimum of three possible operations (insert, remove, replace)
currRow[j] = 1 + Math.min(
currRow[j - 1], // Insert
prevRow[j], // Remove
prevRow[j - 1] // Replace
);
}
}
// Update the previous row with the current row for the next iteration
prevRow = [...currRow];
}
// The result is the value at the bottom-right corner of the matrix
return currRow[n];
} // Driver code const str1 = "kitten" ;
const str2 = "sitting" ;
// Function call to calculate Levenshtein distance const distance = levenshteinTwoMatrixRows(str1, str2); // Print the result console.log( "Levenshtein Distance:" , distance);
|
Levenshtein Distance: 3
Time complexity: O(m*n)
Auxiliary Space: O(n)