Dynamic Programming | Set 5 (Edit Distance)

Given two strings str1 and str2 and below operations that can performed on str1. Find minimum number of edits (operations) required to convert ‘str1′ into ‘str2′.

  1. Insert
  2. Remove
  3. Replace

All of the above operations are of equal cost.

Examples:

Input:   str1 = "geek", str2 = "gesek"
Output:  1
We can convert str1 into str2 by inserting a 's'.

Input:   str1 = "cat", str2 = "cut"
Output:  1
We can convert str1 into str2 by replacing 'a' with 'u'.

Input:   str1 = "sunday", str2 = "saturday"
Output:  3
Last three and first characters are same.  We basically
need to convert "un" to "atur".  This can be done using
below three operations. 
Replace 'n' with 'r', insert t, insert a

What are the subproblems in this case?
The idea is process all characters one by one staring from either from left or right sides of both strings.
Let we traverse from right corner, there are two possibilities for every pair of character being traversed.

m: Length of str1 (first string)
n: Length of str2 (second string)
  1. If last characters of two strings are same, nothing much to do. Ignore last characters and get count for remaining strings. So we recur for lengths m-1 and n-1.
  2. Else (If last characters are not same), we consider all operations on ‘str1′, consider all three operations on last character of first string, recursively compute minimum cost for all three operations and take minimum of three values.
    1. Insert: Recur for m and n-1
    2. Remove: Recur for m-1 and n
    3. Replace: Recur for m-1 and n-1

Below is C++ implementation of above Naive recursive solution.

C++

// A Naive recursive C++ program to find minimum number
// operations to convert str1 to str2
#include<bits/stdc++.h>
using namespace std;

// Utility function to find minimum of three numbers
int min(int x, int y, int z)
{
   return min(min(x, y), z);
}

int editDist(string str1 , string str2 , int m ,int n)
{
    // If first string is empty, the only option is to
    // insert all characters of second string into first
    if (m == 0) return n;

    // If second string is empty, the only option is to
    // remove all characters of first string
    if (n == 0) return m;

    // If last characters of two strings are same, nothing
    // much to do. Ignore last characters and get count for
    // remaining strings.
    if (str1[m-1] == str2[n-1])
        return editDist(str1, str2, m-1, n-1);

    // If last characters are not same, consider all three
    // operations on last character of first string, recursively
    // compute minimum cost for all three operations and take
    // minimum of three values.
    return 1 + min ( editDist(str1,  str2, m, n-1),    // Insert
                     editDist(str1,  str2, m-1, n),   // Remove
                     editDist(str1,  str2, m-1, n-1) // Replace
                   );
}

// Driver program
int main()
{
    // your code goes here
    string str1 = "sunday";
    string str2 = "saturday";

    cout << editDist( str1 , str2 , str1.length(), str2.length());

    return 0;
}

Java

// A Naive recursive Java program to find minimum number
// operations to convert str1 to str2
class EDIST
{
    static int min(int x,int y,int z)
    {
    	if (x<y && x<z) return x;
    	if (y<x && y<z) return y;
    	else return z;
    }

    static int editDist(String str1 , String str2 , int m ,int n)
    {
        // If first string is empty, the only option is to
	// insert all characters of second string into first
	if (m == 0) return n;
	 
	// If second string is empty, the only option is to
	// remove all characters of first string
	if (n == 0) return m;
	 
	// If last characters of two strings are same, nothing
	// much to do. Ignore last characters and get count for
	// remaining strings.
	if (str1.charAt(m-1) == str2.charAt(n-1))
	    return editDist(str1, str2, m-1, n-1);
	 
	// If last characters are not same, consider all three
	// operations on last character of first string, recursively
	// compute minimum cost for all three operations and take
	// minimum of three values.
	return 1 + min ( editDist(str1,  str2, m, n-1),    // Insert
	                 editDist(str1,  str2, m-1, n),   // Remove
	                 editDist(str1,  str2, m-1, n-1) // Replace                     
	               );
    }

    public static void main(String args[])
    {
    	String str1 = "sunday";
        String str2 = "saturday";
 
        System.out.println( editDist( str1 , str2 , str1.length(), str2.length()) );
    }
}
/*This code is contributed by Rajat Mishra*/

Python

# A Naive recursive Python program to fin minimum number
# operations to convert str1 to str2
def editDistance(str1, str2, m , n):

    # If first string is empty, the only option is to
    # insert all characters of second string into first
    if m==0:
         return n

    # If second string is empty, the only option is to
    # remove all characters of first string
    if n==0:
        return m

    # If last characters of two strings are same, nothing
    # much to do. Ignore last characters and get count for
    # remaining strings.
    if str1[m-1]==str2[n-1]:
        return editDistance(str1,str2,m-1,n-1)

    # If last characters are not same, consider all three
    # operations on last character of first string, recursively
    # compute minimum cost for all three operations and take
    # minimum of three values.
    return 1 + min(editDistance(str1, str2, m, n-1),    # Insert
                   editDistance(str1, str2, m-1, n),    # Remove
                   editDistance(str1, str2, m-1, n-1)    # Replace
                   )

# Driver program to test the above function
str1 = "sunday"
str2 = "saturday"
print editDistance(str1, str2, len(str1), len(str2))

# This code is contributed by Bhavya Jain


Output:

3

The time complexity of above solution is exponential. In worst case, we may end up doing O(3m) operations. The worst case happens when none of characters of two strings match. Below is a recursive call diagram for worst case.
EditDistance

We can see that many subproblems are solved again and again, for example eD(2,2) is called three times. Since same suproblems are called again, this problem has Overlapping Subprolems property. So Edit Distance problem has both properties (see this and this) of a dynamic programming problem. Like other typical Dynamic Programming(DP) problems, recomputations of same subproblems can be avoided by constructing a temporary array that stores results of subpriblems.

C++

// A Dynamic Programming based C++ program to find minimum
// number operations to convert str1 to str2
#include<bits/stdc++.h>
using namespace std;

// Utility function to find minimum of three numbers
int min(int x, int y, int z)
{
    return min(min(x, y), z);
}

int editDistDP(string str1, string str2, int m, int n)
{
    // Create a table to store results of subproblems
    int dp[m+1][n+1];

    // Fill d[][] in bottom up manner
    for (int i=0; i<=m; i++)
    {
        for (int j=0; j<=n; j++)
        {
            // If first string is empty, only option is to
            // isnert all characters of second string
            if (i==0)
                dp[i][j] = j;  // Min. operations = j

            // If second string is empty, only option is to
            // remove all characters of second string
            else if (j==0)
                dp[i][j] = i; // Min. operations = i

            // If last characters are same, ignore last char
            // and recur for remaining string
            else if (str1[i-1] == str2[j-1])
                dp[i][j] = dp[i-1][j-1];

            // If last character are different, consider all
            // possibilities and find minimum
            else
                dp[i][j] = 1 + min(dp[i][j-1],  // Insert
                                   dp[i-1][j],  // Remove
                                   dp[i-1][j-1]); // Replace
        }
    }

    return dp[m][n];
}

// Driver program
int main()
{
    // your code goes here
    string str1 = "sunday";
    string str2 = "saturday";

    cout << editDistDP(str1, str2, str1.length(), str2.length());

    return 0;
}

Java

// A Dynamic Programming based Java program to find minimum
// number operations to convert str1 to str2
class EDIST
{
    static int min(int x,int y,int z)
    {
    	if (x < y && x <z) return x;
    	if (y < x && y < z) return y;
    	else return z;
    }

    static int editDistDP(String str1, String str2, int m, int n)
    {
	    // Create a table to store results of subproblems
	    int dp[][] = new int[m+1][n+1];
	 
	    // Fill d[][] in bottom up manner
	    for (int i=0; i<=m; i++)
	    {
	        for (int j=0; j<=n; j++)
	        {
	            // If first string is empty, only option is to
	            // isnert all characters of second string
	            if (i==0)
	                dp[i][j] = j;  // Min. operations = j
	 
	            // If second string is empty, only option is to
	            // remove all characters of second string
	            else if (j==0)
	                dp[i][j] = i; // Min. operations = i
	 
	            // If last characters are same, ignore last char
	            // and recur for remaining string
	            else if (str1.charAt(i-1) == str2.charAt(j-1))
	                dp[i][j] = dp[i-1][j-1];
	 
	            // If last character are different, consider all
	            // possibilities and find minimum
	            else
	                dp[i][j] = 1 + min(dp[i][j-1],  // Insert
	                                   dp[i-1][j],  // Remove
	                                   dp[i-1][j-1]); // Replace
	        }
        }
 
        return dp[m][n];
    }

    

	public static void main(String args[])
	{
        String str1 = "sunday";
        String str2 = "saturday";
        System.out.println( editDistDP( str1 , str2 , str1.length(), str2.length()) );
	}
}/*This code is contributed by Rajat Mishra*/

Python

# A Dynamic Programming based Python program for edit
# distance problem
def editDistDP(str1, str2, m, n):
    # Create a table to store results of subproblems
    dp = [[0 for x in range(n+1)] for x in range(m+1)]

    # Fill d[][] in bottom up manner
    for i in range(m+1):
        for j in range(n+1):

            # If first string is empty, only option is to
            # isnert all characters of second string
            if i == 0:
                dp[i][j] = j    # Min. operations = j

            # If second string is empty, only option is to
            # remove all characters of second string
            elif j == 0:
                dp[i][j] = i    # Min. operations = i

            # If last characters are same, ignore last char
            # and recur for remaining string
            elif str1[i-1] == str2[j-1]:
                dp[i][j] = dp[i-1][j-1]

            # If last character are different, consider all
            # possibilities and find minimum
            else:
                dp[i][j] = 1 + min(dp[i][j-1],        # Insert
                                   dp[i-1][j],        # Remove
                                   dp[i-1][j-1])    # Replace

    return dp[m][n]

# Driver program
str1 = "sunday"
str2 = "saturday"

print(editDistDP(str1, str2, len(str1), len(str2)))
# This code is contributed by Bhavya Jain

Output:

3

Time Complexity: O(m x n)
Auxiliary Space: O(m x n)

Applications: There are many practical applications of edit distance algorithm, refer Lucene API for sample. Another example, display all the words in a dictionary that are near proximity to a given word\incorrectly spelled word.

Thanks to Vivek Kumar for suggesting above updates.

Thanks to Venki for providing initial post. Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above



Company Wise Coding Practice    Topic Wise Coding Practice





Writing code in comment? Please use code.geeksforgeeks.org, generate link and share the link here.

  • Vivek

    if (X[i]==Y[j])
    it should be only
    T[i][j]=T[i-1][j-1];

    in the above code, why are we considering insertion and deletion
    if X[i] equals Y[j] , isn’t it redundant?

  • prashant jha

    here in naive recursive implementation the complexiy will be 0(3^n)
    but in dp there are exactly m*n no of subproblems
    here is my simple implementaion using dp
    http://ideone.com/XAUoU9

  • prashant jha

    #include
    #include
    #define m 20
    int arr[m][m];
    using namespace std;
    int min(int a,int b)
    {
    return a>b?b:a;
    }
    int min(int a,int b,int c)
    {
    return min(min(a,b),c);
    }
    int fun(char st1[],char st2[],int low1,int low2,int high1,int high2)
    {
    if(arr[low1][low2]!=-1)
    return arr[low1][low2];
    if(low1>high1)
    return (high2-low2+1);
    if(low2>high2)
    return (high1-low1+1);
    arr[low1+1][low2]=fun(st1,st2,low1+1,low2,high1,high2);
    arr[low1][low2+1]=fun(st1,st2,low1,low2+1,high1,high2);
    arr[low1+1][low2+1]=fun(st1,st2,low1+1,low2+1,high1,high2);
    if(st1[low1]!=st2[low2])
    arr[low1][low2]=min(1+arr[low1+1][low2],1+arr[low1][low2+1],1+arr[low1+1][low2+1]);
    else
    arr[low1][low2]=min(1+arr[low1+1][low2],1+arr[low1][low2+1],arr[low1+1][low2+1]);
    return arr[low1][low2];
    }
    int main()
    {
    char st1[] = “sunday” ;
    char st2[] = “saturday” ;
    for(int i=0;i<m;i++)
    {
    for(int j=0;j<m;j++)
    {
    arr[i][j]=-1;
    }
    }
    cout<<fun(st1,st2,0,0,strlen(st1)-1,strlen(st2)-1)<<" is minimum possible changes to convert s1 to s2.n";
    return 0;
    }

  • alext

    Detailed explanation:
    “Alignment” is an important concept in this problem, for eg., “SUNDAY”—>”SATURDAY”, and the alignment should be “S _ _ U N D A Y” with “S A T U R D A Y”, and the Levenshtein distance is 3. See the blanks? Yes, blanks also should be a part of the string, and blanks should also contribute to alignment. SO HERE COMES THE EXPLANATION, for eg., “S A T _”—>”S B _”, that’s E(3, 2), if we align ‘T’ with ‘B’, it goes like E(3, 2)=E(2, 1)+1, easily understood; if we align ‘T’ with ‘_'(from 2nd string), obviously we need to DELETE ‘T’ from the first string, and it’s like E(3, 2)=E(2, 2)+1; if we align ‘_'(from 1st string) with ‘B’, we need to INSERT ‘B’ in the first string, hence E(3,2)=E(3,1)+1.

  • zealfire

    It says for deletion we need to say t[i][j-1] and for insertion t[i-1][j].how is it so,what i could understand it should be reversed.please comment

  • prashantjha

    /*

    int fun(char st1[],char st2[],int i,int j,int h1,int h2)

    {

    if((i>h1)&&(j>h2))

    return 0;

    if(i>h1)

    return (h2-j+1);

    if(j>h2)

    return (h1-i+1);

    if(st1[i]==st2[j])

    return (fun(st1,st2,i+1,j+1,h1,h2));

    else

    return min((1+fun(st1,st2,i+1,j,h1,h2)),

    (1+fun(st1,st2,i,j+1,h1,h2)),

    (1+fun(st1,st2,i+1,j+1,h1,h2)));

    }
    */

  • prashantjha
  • Rushikesh

    Hi,

    Here is the code that I could come up with to solve this problem:

    public class StringTest
    {
    public static void main(String[] args)
    {
    String str1 = args[0];
    String str2 = args[1];

    int num = minOper(str1, str2);
    System.out.println(“Minimum Operations = “+num);
    }

    public static int minOper(String str1, String str2)
    {
    int len1 = str1.length();
    int len2 = str2.length();

    int nInsert = 0;
    int nDelete = 0;
    int nMod = 0;

    String small = null;
    String big = null;
    if(len1 < len2)
    {
    nInsert = len2 – len1;
    small = str1;
    big = str2;
    }
    else
    {
    nDelete = len1 – len2;
    small = str2;
    big = str1;
    }

    int highestMatchingChars = 0;
    int matchingChars = 0;
    char[] smallChars = new char[small.length()+1];
    small.getChars(0, small.length(), smallChars, 0);
    for(int i=0;i<=(nInsert+nDelete);i++)
    {
    matchingChars = 0;
    char[] bigChars = new char[small.length()+1];
    big.getChars(i, i+small.length(), bigChars, 0);
    for(int j=0;j<small.length();j++)
    {
    if(smallChars[j] == bigChars[j])
    {
    System.out.println("Small char = "+smallChars[j]+"tBig char = "+bigChars[j]);
    matchingChars++;
    }
    }

    if(highestMatchingChars < matchingChars)
    highestMatchingChars = matchingChars;
    }

    nMod = small.length()-highestMatchingChars;
    return (nInsert + nDelete + nMod);
    }
    }

    Please comment if anything missing in the algorithm/code here?

  • hello

    if same cost for R,D,I operation then output is maximum of given two string lengths – length of longest common subsequence of given two strings. i think so… ? :)

    • Mukesh M

      Not True. Consider Str1 and Str2 being “AEDFHR” and “ABCDGH”. LCS is “ADH” but edit distance is 4. Although you are correct in using LCS but way to find edit distance would be break strings on matching points and then sum up max(str1_1.len,str2_1.len)+max(str1_2.len,str2_2.len)+max(str1_3.len,str2_3.len)…

  • me.deeiip

    How is it D.P. if recursion is not memoized?

    • what’s in the name

      Actually in bottom up manner only those table entries are queried which have already been entered. Hence it isn’t called memoization in true terms which is for top down manner.
      *(T + (i)*n + (j)) = Minimum(leftCell, topCell, cornerCell);
      This line makes entry to the table for future use .

  • Shimpu

    Can anyone please elaborate clearly the alignment thing and how the code it working??

  • Nitesh

    The Recursive code checks for left and right even when X[m-1]==Y[n-1] , it should simply call the next corner case instead of left and right , will that miss any special case??

    Recursive code :

    # include

    # include

    # include

    # include

    # include

    using namespace std;

    string a,b;

    int m,n;

    int minimum(int a, int b, int c)

    {

    return(min(min(a,b),c));

    }

    int edist(int i, int j, int count)

    {

    if(i==m||j==n)

    return count;

    if(a[i]==b[j])

    {

    return(edist(i+1,j+1,count));

    }

    else

    {

    int a=edist(i+1,j,count+1);

    int b=edist(i+1,j+1,count+1);

    int c=edist(i,j+1,count+1);

    return(minimum(a,b,c));

    }

    }

    int main()

    {

    a=”SUNDAY”;

    b=”SATURDAY”;

    m=a.length();

    n=b.length();

    printf(“%d”,edist(0,0,0));

    }

  • anon

    This is very bad,lossy description provided here.Its like first written in Chinese and then translated into english.Not expected from the team.There must be some IMAGES showing what you are trying to say.

  • Kaidul Islam Sazal

    The Dynamic programming portion is buggy.
    This is the right implementation

    int EditDistanceDP(char X[], char Y[], int lenX, int lenY) {

    // T[m][n]
    int T[lenX + 1][lenY + 1];

    for(int i = 0; i <= lenX; i++) T[i][0] = i;
    for(int i = 0; i <= lenY; i++) T[0][i] = i;

    for(int i = 1; i <= lenX; i++) {
    for(int j = 1; j <= lenY; j++) {
    if (X[i - 1] == Y[j - 1])
    T[i][j] = T[i - 1][j - 1];
    else
    T[i][j] = Minimum(T[i - 1][j], T[i][j - 1], T[i - 1][j - 1]) + 1;
    }
    }

    return T[lenX][lenY];
    }

  • Chandan Mittal

    Please tell what does ‘align’ means in the 3 cases above?

  • adit
     
    /* Short Implementation */
    
    int EditDistanceDP(char X[], char Y[])
    {
    	        int lx=strlen(X),ly=strlen(Y);
    	        int edit[lx+1][ly+1];
    
    	        for(int i=0;i<=lx;++i)
    	    		edit[i][0]=i;
    		for(int i=0;i<=ly;++i)
    	    		edit[0][i]=i;
    		for(int i=1;i<=lx;++i)
    			for(int j=1;j<=ly;++j)
    				edit[i][j]=min(edit[i-1][j-1]+!(X[i-1]==Y[j-1]),min(edit[i][j-1],edit[i-1][j])+1);
    
    return edit[lx][ly];  
    }
    
     
    • adit

      sorry guys for so many posts ! posting for the first time !

  • aaditt
     
    /* Short Implementation */
    
    int EditDistanceDP(char X[], char Y[])
    {
    	        int lx=strlen(X),ly=strlen(Y);
    	        int edit[lx+1][ly+1];
    
    	        for(int i=0;i<=lx;++i)
    	    		edit[i][0]=i;
    		for(int i=0;i<=ly;++i)
    	    		edit[0][i]=i;
    		for(int i=1;i<=lx;++i)
    			for(int j=1;j<=ly;++j)
    				edit[i][j]=min(edit[i-1][j-1]+!(X[i-1]==Y[j-1]),min(edit[i][j-1],edit[i-1][j])+1);
    
    return edit[lx][ly];  
    }
    
     
  • aaditt
     
    /* Short Implementation */
    int EditDistanceDP(char X[], char Y[])
    {
    	        int lx=strlen(X),ly=strlen(Y);
    	        int edit[lx+1][ly+1];
    
    	        for(int i=0;i<=lx;++i)
    	    		edit[i][0]=i;
    		for(int i=0;i<=ly;++i)
    	    		edit[0][i]=i;
    		for(int i=1;i<=lx;++i)
    			for(int j=1;j<=ly;++j)
    				edit[i][j]=min(edit[i-1][j-1]+!(X[i-1]==Y[j-1]),min(edit[i][j-1],edit[i-1][j])+1);
    
    return edit[lx][ly];  
    }
     
  • aaditt

    int EditDistanceDP(char X[], char Y[])
    {
    int lx=strlen(X),ly=strlen(Y);
    int edit[lx+1][ly+1];

    for(int i=0;i<=lx;++i) edit[i][0]=i; for(int i=0;i<=ly;++i) edit[0][i]=i; for(int i=1;i<=lx;++i) for(int j=1;j<=ly;++j) edit[i][j]=min(edit[i-1][j-1]+!(X[i-1]==Y[j-1]),min(edit[i][j-1],edit[i-1][j])+1); return edit[lx][ly]; }

  • t_thirupathi

    A small correction in the sentence –
    “Given strings SUNDAY and SATURDAY. We want to convert SUNDAY into SATURDAY with minimum edits. Let us pick i = 2 and j = 4 i.e. prefix strings are SUN and SATU respectively (assume the strings indices start at 1). The right most characters can be aligned in three different ways.”

    Instead of “i = 2 and j = 4″, shouldn’t it be “i = 3 and j = 4″?

    • Nagaraju

      It is i=2 only and their logic follows this, but mistake here is they wrote “SUN” instead of “SU”

       
      /* Paste your code here (You may delete these lines if not writing code) */
       
  • rahul23

    @venki

    In DP implementation:-
    // T[i][j-1]
    leftCell = *(T + i*n + j-1);
    leftCell += EDIT_COST; // deletion

    // T[i-1][j]
    topCell = *(T + (i-1)*n + j);
    topCell += EDIT_COST; // insertion

    Deletion should be insertion and insertion shoud be deletion

    Like if we have A in X and Y is C
    then A->Y
    delete will say delete A…we need to find cost for NULL->Y
    which will be given by [i-1][j]

    and insertion is given by [i][j-1]
    as if A->Y
    we will insert Y and find cost to A->NULL

    [i][j-1]
    Plz update it if m ryt,otherwise correct me.w8ing for ur response

  • rahul23

    @venki Inthe recursive sifinition of fxn
    the following line

    int corner = EditDistanceRecursion(X, Y, m-1, n-1) + (X[m] != Y[n]);
    should be changed to
    int corner = EditDistanceRecursion(X, Y, m-1, n-1) + (X[m-1] != Y[n-1]);
    For eg. if we have X=”A” and Y=”X”;then min should be 1.
    But your function will compare m and n index value(1 and 1 index)as m and n contains 1 which is NULL
    and considering these equal and replacement cost is 0 and it calls for m-1 and n-1 which will be 0..so 0+0 will become 0…Kindly update in corner variable m-1 and n-1.

  • harshieee

    program giving wrong output for
    s1 = “hello”
    s2 = “hellooo”

    output is cuming:
    Minimum edits required to convert hello into hellooo is 2
    Minimum edits required to convert hello into hellooo is 5 by recursion

    • Swapnil R Mehta

      There is a problem in “Minimum” function, thus answers are coming different with dp and recursive approach.
      Please make it as follows:

       
      int Minimum(int a, int b, int c)
      {
          int min;
          if( a < b && a < c ) min = a;
          else if( b < a && b < c ) min = b;
          else min = c;
          return min;
      }
       
      • Thanks both of you for pointing the error. Code is updated.

      • sobhan

        #include
        using namespace std;

        T[i][j]=min(leftcell,min(cornercell,topcell));

  • shine

    can we think of applying these oprations in certain conditions…like insert or delete can give min cost if l1l2 delete or replace may be beneficial…plz do reply

  • Alka

    What if we convert “SATURDAY” to “SUNDAY”? Results in both the methods used above are different.

    • wgpshashank

      Yes , its should be and it will.

    • wgpshashank

      Yes , it should be and it will.

  • sreeram

    i think in the base cases E(i,0) it should be like i*EDIT_COST instead of i

    • Yes. In the current program we took all edit operations of same cost.

  • Manak

    Correct me if I am wrong, but can this question be solved by first finding the largest common sub-sequence and then subtracting it from the length of the greater string?

    • Venki

      No, that will not always lead to optimum alignment.

      • Manak

        Can you specify an example? I cannot get my head around this.

        • Venki

          Manak, you can take another example given in the content. Consider the words

          exponential – ponil = exent
          polynomial – ponil = lyom

          But the ED(exponential, polynomial) != ED(exent, lyom), here ED stands for Edit Distance.

          Practice with few examples, if still not clear, let me know. I would need some time for detailed explanation.

          • zyfo2

            Don’t get it. Can you give a more detailed example including how to edit? Thank you.

          • zyfo2

            get it.
            the example is like
            abcd
            cde
            LCS=2 “cd”
            but edit distance is 3

          • zyfo2

            In face if only deletion and insertion are possible. then LCS can be applied here

          • Silent

            I guess we can do it by LCS..we would have to compare longest common subsequence with both the strings a character at a time.. correct me if i am wrong??

  • bhuvi

    The following line should be changed

     
     cornerCell += (X[i-1] != Y[j-1]);
     

    to

     
     cornerCell += (X[i] != Y[j]);
     

    since we are at i,j we should be comparing x[i] and y[j]. What say?

    • Venki

      Thanks for comment. The indexing is not an error. Please read the content. We use table of size m+1 x n+1. The indices i and j and one step ahead of the string location, so we need to subtract 1.

  • Saurabh Jain

    /* Paste your code here (You may delete these lines if not writing code) */

    import java.util.Scanner;

    /**
    *
    * @author saurabh
    */
    public class EditDistanceDPP
    {
    char[] s1,s2;

    public EditDistanceDPP()
    {
    Scanner sc = new Scanner(System.in);
    s1 = sc.nextLine().toCharArray();
    s2 = sc.nextLine().toCharArray();
    System.out.println("Edit distance is : "+editDistance(s1,s2));
    }

    private int editDistance(char[] st1, char[] st2)
    {
    int[][] s = new int[s1.length+1][s2.length+1];
    for(int i=0; i<=s1.length; i++)
    {
    for(int j=0; j<=s2.length; j++)
    {
    if(i==0)
    s[i][j]=j;
    else if(j==0)
    s[i][j]=i;
    else
    s[i][j] = min(s[i-1][j-1]+(st1[i-1]==st2[j-1]?0:1),s[i-1][j]+1,s[i][j-1]+1);
    }
    }
    return s[s1.length][s2.length];
    }

    int min(int a, int b, int c)
    {
    return(a<b?a<c?a:c:b<c?b:c);
    }

    public static void main(String[] args)
    {
    EditDistanceDPP edd = new EditDistanceDPP();
    }
    }

    This is a quite simple Dynamic Programming approach with time complexity as O(m*n) and space complexity also as O(m*n)….

    • Saurabh Jain

      Correct me..if anything is wrong in the above code…thanks….

      • SAM

        your code is working fine!! did anyone pointed some mistakes in it??

  • robin singh

    kindly quote some references to this problem so that it becomes more clear.
    thankyou

     
    /* Paste your code here (You may delete these lines if not writing code) */
     
    • Venki

      Algorithms by Das Guptha is good reference.

  • Jatin

    In function display the below changes should be made–>
    //(base + r * col)1 should be replaced by *(base + r * col + c)

    • Venki

      @Jatin, thanks. It was typo during post update. I have updated the post.

  • Jatin
     
    //(base + r * col)1 should be replaced by *(base + r * col + c)
     
  • PsychoCoder

    In the documentation of the table inside the program :

    leftCell = table[i][j-1] ;
    and
    topCell = table[i-1][j] ;

    It should be,
    leftCell = table[i-1][j] ;
    and
    topCell = table[i][j-1] ;

  • this link may be helpful

  • Venkatesh
  • Ratan

    “Given strings SUNDAY and SATURDAY. We want to convert SUNDAY into SATURDAY with minimum edits. Let us pick i = 2 and j = 4 i.e. prefix strings are SUN and SATU respectively”

    in this line change i=3 or prefix as ‘SU’.

  • Doom

    Usually the costs D, I and R are not same. In such case the problem can be represented as an acyclic directed graph (DAG) with weights on each edge, and finding shortest path gives edit distance.
    How to construct this graph? could you plz give some basic steps? just the logic.

  • Evgenia
  • Anonymous
     int EDIT[100][100];
    int solve_edit( string a, string b) {
        for (int j=0;j<=b.size();j++) {
            EDIT[0][j]=j;
        }
        for (int i=1;i<=a.size();i++) {
            EDIT[i][0]=i;
            for (int j=1;j<=b.size();j++) {
                EDIT[i][j]= min( min( EDIT[i][j-1]+1,EDIT[i-1][j]+1),  EDIT[i-1][j-1]+ (int)(a[i-1]!=b[j-1]));
            }
        }
    
        return EDIT[a.size()][b.size()];
    }
     
  • rajcools

    in description its written —
    Combining all the subproblems minimum cost of aligning prefix strings ending at i and j given by

    E(i, j) = min( [E(i-1, j) + D], [E(i, j-1) + I], [E(i-1, j-1) + I if i,j characters are not same] )

    in —[E(i-1, j-1) + I if i,j characters are not same] )

    shouldnt here be replace(R) instead of Insert(I)

    else it would be two operations
    [E(i-1, j-1) + I +D … we insert one char from target string and delete from original string

    • @rajcools, thanks. It should be replace. I will update.

      • iitr.ankur

        Instead of Using DAG, can’t we simply define 3 different Edit Costs: Edit_Insert(ex. 1), Edit_Delete(2), Edit_Remove(5) and use these in the 3 cases??