Java Program to Implement Levenshtein Distance Computing Algorithm
The Levenshtein distance also called the Edit distance, is the minimum number of operations required to transform one string to another.
Typically, three types of operations are performed (one at a time) :
- Replace a character.
- Delete a character.
- Insert a character.
Examples:
Input: str1 = “glomax”, str2 = “folmax”
Output: 3
str1 is converted to str2 by replacing ‘g’ with ‘o’, deleting the second ‘o’, and inserting ‘f’ at the beginning. There is no way to do it with fewer than three edits.
Input: s1 = “GIKY”, s2 = “GEEKY”
Output: 2
s1 is converted to s2 by inserting ‘E’ right after ‘G’,and replacing ‘I’ with ‘E’.
This problem can be done in two ways :
- Using Recursion.
- Using Dynamic Programming.
Method 1: Recursive Approach
Let’s consider by taking an example
Given two strings s1 = “sunday” and s2 = “saturday”. We want to convert “sunday” into “saturday” with minimum edits.
- Consider ‘i’ and ‘j’ as the upper-limit indices of substrings generated using s1 and s2.
- Let us pick i = 2 and j = 4 i.e. prefix strings are ‘su’ and ‘satu’ respectively (assume the strings indices start at 1). The rightmost characters can be aligned in three different possible ways.
- Possible Case 1: Align the characters ‘u’ and ‘u’. They are equal, no edit is required. We still left with the problem of i = 1 and j = 3, so we should proceed to find Levenshtein distance(i-1, j-1).
- Possible Case 2 (Deletion): Align the right character from the first string and no character from the second string. We need a deletion here. We still left with problem of i = 1 and j = 4, so we should proceed finding Levenshtein distance(i-1, j).
- Possible Case 3 (Insertion): Align the right character from the second string and no character from the first string. We need an insertion here. We still left with the problem of i = 2 and j = 3, so we should proceed to find Levenshtein distance(i, j-1).
- We assume that the character to be inserted in the first string is the same as the right character of the second string.
- Possible Case 4 (Replacement): Align right characters from the first string as well as from the second string. We need a substitution here. We still left with problem of i = 1 and j = 3, so we should proceed finding Levenshtein distance(i-1, j-1).
- We assume that the replaced character in the first string is the same as the right character of the second string.
- We have to find the minimum of all the possible three cases.
Recursive Implementation:
Java
import java.util.*;
class LevenshteinDistanceRecursive {
static int compute_Levenshtein_distance(String str1,
String str2)
{
if (str1.isEmpty())
{
return str2.length();
}
if (str2.isEmpty())
{
return str1.length();
}
int replace = compute_Levenshtein_distance(
str1.substring( 1 ), str2.substring( 1 ))
+ NumOfReplacement(str1.charAt( 0 ),str2.charAt( 0 ));
int insert = compute_Levenshtein_distance(
str1, str2.substring( 1 ))+ 1 ;
int delete = compute_Levenshtein_distance(
str1.substring( 1 ), str2)+ 1 ;
return minm_edits(replace, insert, delete);
}
static int NumOfReplacement( char c1, char c2)
{
return c1 == c2 ? 0 : 1 ;
}
static int minm_edits( int ... nums)
{
return Arrays.stream(nums).min().orElse(
Integer.MAX_VALUE);
}
public static void main(String args[])
{
String s1 = "glomax" ;
String s2 = "folmax" ;
System.out.println(compute_Levenshtein_distance(s1, s2));
}
}
|
Time Complexity: O(3^n) because at each step, we branch-off into three recursive calls. Here, ‘n’ is the length of the first string.
Method 2: Dynamic Programming Approach
If we draw the recursion tree of the above solution, we can see that the same sub-problems are getting computed again and again. We know that Dynamic Programming comes to the picture when subproblem solutions can be memoized rather than computed again and again.
- The Memoized version follows the top-down approach since we first break the problems into subproblems and then calculate and store values.
- We can also solve this problem in a bottom-up approach. In a bottom-up manner, we solve the sub-problems first, then solve larger sub-problems from them.
Dynamic Programming Implementation (Optimised approach)
Java
import java.util.*;
class LevenshteinDistanceDP {
static int compute_Levenshtein_distanceDP(String str1,
String str2)
{
int [][] dp = new int [str1.length() + 1 ][str2.length() + 1 ];
for ( int i = 0 ; i <= str1.length(); i++)
{
for ( int j = 0 ; j <= str2.length(); j++) {
if (i == 0 ) {
dp[i][j] = j;
}
else if (j == 0 ) {
dp[i][j] = i;
}
else {
dp[i][j] = minm_edits(dp[i - 1 ][j - 1 ]
+ NumOfReplacement(str1.charAt(i - 1 ),str2.charAt(j - 1 )),
dp[i - 1 ][j] + 1 ,
dp[i][j - 1 ] + 1 );
}
}
}
return dp[str1.length()][str2.length()];
}
static int NumOfReplacement( char c1, char c2)
{
return c1 == c2 ? 0 : 1 ;
}
static int minm_edits( int ... nums)
{
return Arrays.stream(nums).min().orElse(
Integer.MAX_VALUE);
}
public static void main(String args[])
{
String s1 = "glomax" ;
String s2 = "folmax" ;
System.out.println(compute_Levenshtein_distanceDP(s1, s2));
}
}
|
Time Complexity: O(m*n), where m is the length of the first string, and n is the length of the second string.
Auxiliary Space: O(m*n), as the matrix used in the above implementation has dimensions m*n.
Applications:
- Spell Checkers.
- Speech Recognition.
- DNA Analysis.
Last Updated :
21 Feb, 2022
Like Article
Save Article
Share your thoughts in the comments
Please Login to comment...