Open In App

Java Program to Implement Levenshtein Distance Computing Algorithm

The Levenshtein distance also called the Edit distance, is the minimum number of operations required to transform one string to another.

Typically, three types of operations are performed (one at a time) :



Examples:

Input:  str1 = “glomax”, str2 = “folmax” 



Output: 3 

str1 is converted to str2 by replacing ‘g’ with ‘o’, deleting the second ‘o’, and inserting ‘f’ at the beginning. There is no way to do it with fewer than three edits.

Input:  s1 = “GIKY”, s2 = “GEEKY” 

Output: 2 

s1 is converted to s2 by inserting ‘E’ right after ‘G’,and replacing ‘I’ with  ‘E’.

This problem can be done in two ways :

  1. Using Recursion.
  2. Using Dynamic Programming.

Method 1: Recursive Approach  

Let’s consider by taking an example

Given two strings s1 = “sunday” and s2 = “saturday”. We want to convert “sunday” into “saturday” with minimum edits.

Recursive Implementation:




// Java implementation of recursive Levenshtein distance
// calculation
 
import java.util.*;
class LevenshteinDistanceRecursive {
 
    static int compute_Levenshtein_distance(String str1,
                                            String str2)
    {
        // If str1 is empty, all
        // characters of str2 are
        // inserted into str1, which is
        // of the only possible method of
        // conversion with minimum
        // operations.
 
        if (str1.isEmpty())
        {
            return str2.length();
        }
 
        // If str2 is empty, all
        // characters of str1 are
        // removed, which is the
        // only possible
        // method of conversion with minimum
        // operations.
 
        if (str2.isEmpty())
        {
            return str1.length();
        }
 
        // calculate the number of distinct characters to be
        // replaced in str1
        // by recursively traversing each substring
 
        int replace = compute_Levenshtein_distance(
              str1.substring(1), str2.substring(1))
              + NumOfReplacement(str1.charAt(0),str2.charAt(0));
 
        // calculate the number of insertions in str1
        // recursively
        int insert = compute_Levenshtein_distance(
                         str1, str2.substring(1))+ 1;
 
        // calculate the number of deletions in str1
        // recursively
        int delete = compute_Levenshtein_distance(
                         str1.substring(1), str2)+ 1;
 
        // returns minimum of three operations
       
        return minm_edits(replace, insert, delete);
    }
 
    static int NumOfReplacement(char c1, char c2)
    {
        // check for distinct characters
        // in str1 and str2
       
        return c1 == c2 ? 0 : 1;
    }
 
    static int minm_edits(int... nums)
    {
        // receives the count of different
        // operations performed and returns the
        // minimum value among them.
       
        return Arrays.stream(nums).min().orElse(
            Integer.MAX_VALUE);
    }
 
    // Driver Code
    public static void main(String args[])
    {
        String s1 = "glomax";
        String s2 = "folmax";
 
        System.out.println(compute_Levenshtein_distance(s1, s2));
    }
}

 
 

Output
3

 

Time Complexity: O(3^n) because at each step, we branch-off into three recursive calls. Here, ‘n’ is the length of the first string.

 

Method 2: Dynamic Programming Approach   

 

If we draw the recursion tree of the above solution, we can see that the same sub-problems are getting computed again and again. We know that Dynamic Programming comes to the picture when subproblem solutions can be memoized rather than computed again and again. 

 

 

 Dynamic Programming Implementation (Optimised  approach)

 




// Java implementation of Levenshtein distance calculation
// Using Dynamic Programming (Optimised solution)
 
import java.util.*;
class LevenshteinDistanceDP {
 
    static int compute_Levenshtein_distanceDP(String str1,
                                              String str2)
    {
 
        // A 2-D matrix to store previously calculated
        // answers of subproblems in order
        // to obtain the final
 
        int[][] dp = new int[str1.length() + 1][str2.length() + 1];
 
        for (int i = 0; i <= str1.length(); i++)
        {
            for (int j = 0; j <= str2.length(); j++) {
 
                // If str1 is empty, all characters of
                // str2 are inserted into str1, which is of
                // the only possible method of conversion
                // with minimum operations.
                if (i == 0) {
                    dp[i][j] = j;
                }
 
                // If str2 is empty, all characters of str1
                // are removed, which is the only possible
                //  method of conversion with minimum
                //  operations.
                else if (j == 0) {
                    dp[i][j] = i;
                }
 
                else {
                    // find the minimum among three
                    // operations below
 
                     
                    dp[i][j] = minm_edits(dp[i - 1][j - 1]
                        + NumOfReplacement(str1.charAt(i - 1),str2.charAt(j - 1)), // replace
                        dp[i - 1][j] + 1, // delete
                        dp[i][j - 1] + 1); // insert
                }
            }
        }
 
        return dp[str1.length()][str2.length()];
    }
 
    // check for distinct characters
    // in str1 and str2
   
    static int NumOfReplacement(char c1, char c2)
    {
        return c1 == c2 ? 0 : 1;
    }
 
    // receives the count of different
    // operations performed and returns the
    // minimum value among them.
   
    static int minm_edits(int... nums)
    {
 
        return Arrays.stream(nums).min().orElse(
            Integer.MAX_VALUE);
    }
 
    // Driver Code
    public static void main(String args[])
    {
 
        String s1 = "glomax";
        String s2 = "folmax";
 
        System.out.println(compute_Levenshtein_distanceDP(s1, s2));
    }
}

 
 

Output
3

 

Time Complexity: O(m*n), where m is the length of the first string, and n is the length of the second string.

 

Auxiliary Space: O(m*n), as the matrix used in the above implementation has dimensions m*n.

 

Applications: 

 

 


Article Tags :