Implementing Regular Expression Matching

Given two strings S and P where S consists of only lowercase English alphabets while P consists of lowercase English alphabets as well as special characters ‘.’ and ‘*’, the task is to implement a function to test regular expression such that:

'.' Matches any single character.
'*' Matches zero or more of the preceding element.

Note: For each appearance of the character ‘*', there will be a previous valid character to match.

Examples:

Input: s = "aaa", p = "a"
Output: false
Explanation: "a" does not match the entire string "aaa".

Input: s = "abb", p = "a.*"
Output: true
Explanation: replace . with b then p becomes ab* now replace * with one preceeding character hence p becomes abb.

Approach: To solve the problem follow the below observations:

Suppose: s = “aa” and p = “aa”, since all the characters in both s and p are the same, it’s a match.

Suppose: s = “aabb” and p = “aab*”, We know that substrings bb and b* match because * can be replaced by one b. Since we already know that the remaining substrings “aa” and “aa” are matched, the whole strings are also a match.

What can we infer from this? Right, if we have a solution of part of a problem, we can use that partial result and can go forward. Also, we can use the already calculated result without calculating it again.

Does this ring a bell ????? Yes, this problem satisfies the following two properties –

Optimal Substructure — Any problem has optimal substructure property if its overall optimal solution can be constructed from the optimal solutions of its subproblems.

Overlapping Subproblems — Any problem has overlapping sub-problems if finding its solution involves solving the same subproblem multiple times.

It is now evident that we can use Dynamic Programming to solve this problem. Below are the steps —

Create a boolean 2D dp array with sizes as boolean[][] dp = new boolean[s.length() + 1][p.length() + 1]. We are adding extra 1 to incorporate the case in case either or both of the strings are empty.
If both strings are empty, then it’s a match, thus, dp[0][0] = true.
Let’s take an example s = "aab" and p = "c*a*b" and create a DP table.

			c	*	a	*	b
		0	1	2	3	4	5
	0	TRUE	FALSE	TRUE	FALSE	TRUE	FALSE
a	1	FALSE	FALSE	FALSE	TRUE	TRUE	FALSE
a	2	FALSE	FALSE	FALSE	FALSE	TRUE	FALSE
b	3	FALSE	FALSE	FALSE	FALSE	FALSE	TRUE

First column — it means p is empty and it will match to s only if s is also empty which we have stored in dp[0][0]. Thus, remaining values of dp[0][i] will be false.
First row — this is not so easy. It means which p matches empty s. The answer is either an empty pattern or a pattern that represents an empty string such as "a*", "x*y*", "l*m*n*" and so on. In the above example, if s = "" and p = "c*", then due to *, c can be replaced by 0 cs which gives us an empty string. Hence, dp[0][2] = true.
For non-empty strings, let’s say that s[i - 1] == p[j - 1] this means the (i – 1)^th and (j – 1)^th characters are same. This means, we have to check if the remaining strings are a match or not. If they are a match, then the current substrings will be a match, otherwise they won’t be a match i.e., dp[i][j] = dp[i - 1][j - 1]. We’re taking (i – 1)^th and (j – 1)^th characters to offset empty strings as we’re assuming our strings start from index 1.
If p[j - 1] == ".", then it means any single character can be matched. Therefore, here also, we will have to check if the remaining string is a match or not. Thus, dp[i][j] = dp[i - 1][j - 1].
If p[j - 1] == "*", then it means either it’s represents an empty string (0 characters), thus dp[i][j] = dp[i][j - 2] or s[i - 1] == p[j - 2] || p[j - 2] == ".", then current character of string equals the char preceding ‘*' in pattern so the result is dp[i-1][j].

Below is the implementation of above approach:

C++

#include <iostream>
#include <vector>

using namespace std;
 
bool isMatch(string s, string p) {

    int rows = s.length();

    int columns = p.length();
 
    // Base conditions

    if (rows == 0 && columns == 0) {

        return true;

    }

    if (columns == 0) {

        return false;

    }
 
    // DP array

    vector<vector<bool>> dp(rows + 1, vector<bool>(columns + 1, false));
 
    // Empty string and empty pattern are a match

    dp[0][0] = true;
 
    // Deals with patterns with *

    for (int i = 2; i <= columns; i++) {

        if (p[i - 1] == '*') {

            dp[0][i] = dp[0][i - 2];

        }

    }
 
    // For remaining characters

    for (int i = 1; i <= rows; i++) {

        for (int j = 1; j <= columns; j++) {

            if (s[i - 1] == p[j - 1] || p[j - 1] == '.') {

                dp[i][j] = dp[i - 1][j - 1];

            } else if (j > 1 && p[j - 1] == '*') {

                dp[i][j] = dp[i][j - 2];

                if (p[j - 2] == '.' || p[j - 2] == s[i - 1]) {

                    dp[i][j] = dp[i][j] || dp[i - 1][j];

                }

            }

        }

    }

    return dp[rows][columns];
}
 
int main() {

    cout << boolalpha << isMatch("aab", "a.*") << endl;

    return 0;
}

Java

// Java Code for above approach

import java.io.*;

import java.util.Scanner;

public class RegularExpressionMatching {
 
    public static boolean isMatch(String s, String p)

    {

        int rows = s.length();

        int columns = p.length();
 
        /// Base conditions

        if (rows == 0 && columns == 0) {

            return true;

        }

        if (columns == 0) {

            return false;

        }
 
        // DP array

        boolean[][] dp = new boolean[rows + 1][columns + 1];
 
        // Empty string and empty pattern

        // are a match

        dp[0][0] = true;
 
        // Deals with patterns with *

        for (int i = 2; i < columns + 1; i++) {

            if (p.charAt(i - 1) == '*') {

                dp[0][i] = dp[0][i - 2];

            }

        }
 
        // For remaining characters

        for (int i = 1; i < rows + 1; i++) {

            for (int j = 1; j < columns + 1; j++) {

                if (s.charAt(i - 1) == p.charAt(j - 1)

                    || p.charAt(j - 1) == '.') {

                    dp[i][j] = dp[i - 1][j - 1];

                }

                else if (j > 1 && p.charAt(j - 1) == '*') {

                    dp[i][j] = dp[i][j - 2];

                    if (p.charAt(j - 2) == '.'

                        || p.charAt(j - 2)

                               == s.charAt(i - 1)) {

                        dp[i][j] = dp[i][j] | dp[i - 1][j];

                    }

                }

            }

        }

        return dp[rows][columns];

    }
 
    public static void main(String[] args)

    {

        System.out.println(isMatch("aab", "a.*"));

    }
}

Python3

def isMatch(s: str, p: str) -> bool:

    rows, columns = (len(s), len(p))

    # Base conditions

    if rows == 0 and columns == 0:

        return True

    if columns == 0:

        return False

    # DP array

    dp = [[False for j in range(columns + 1)] for i in range(rows + 1)]

    # Since empty string and empty pattern are a match

    dp[0][0] = True

    # Deals with patterns containing *

    for i in range(2, columns + 1):

        if p[i - 1] == '*':

            dp[0][i] = dp[0][i - 2]

    # For remaining characters

    for i in range(1, rows + 1):

        for j in range(1, columns + 1):

            if s[i - 1] == p[j - 1] or p[j - 1] == '.':

                dp[i][j] = dp[i - 1][j - 1]

            elif j > 1 and p[j - 1] == '*':

                dp[i][j] = dp[i][j - 2]

                if p[j - 2] == '.' or p[j - 2] == s[i - 1]:

                    dp[i][j] = dp[i][j] or dp[i - 1][j]

    return dp[rows][columns]

if __name__ == '__main__':

    print(isMatch("aa", "a"))

    print(isMatch("aa", "a*"))

    print(isMatch("ab", "."))

    print(isMatch("aab", "c*a*b"))

    print(isMatch("mississippi", "mis*is*p*."))

    print(isMatch("", ".*"))

using System;
 
public class RegularExpressionMatching
{

    public static bool IsMatch(string s, string p)

    {

        int rows = s.Length;

        int columns = p.Length;
 
        // Base conditions

        if (rows == 0 && columns == 0)

        {

            return true;

        }

        if (columns == 0)

        {

            return false;

        }
 
        // DP array

        bool[][] dp = new bool[rows + 1][];

        for (int i = 0; i <= rows; i++)

        {

            dp[i] = new bool[columns + 1];

        }
 
        // Empty string and empty pattern are a match

        dp[0][0] = true;
 
        // Deals with patterns with *

        for (int i = 2; i <= columns; i++)

        {

            if (p[i - 1] == '*')

            {

                dp[0][i] = dp[0][i - 2];

            }

        }
 
        // For remaining characters

        for (int i = 1; i <= rows; i++)

        {

            for (int j = 1; j <= columns; j++)

            {

                if (s[i - 1] == p[j - 1] || p[j - 1] == '.')

                {

                    dp[i][j] = dp[i - 1][j - 1];

                }

                else if (j > 1 && p[j - 1] == '*')

                {

                    dp[i][j] = dp[i][j - 2];

                    if (p[j - 2] == '.' || p[j - 2] == s[i - 1])

                    {

                        dp[i][j] = dp[i][j] || dp[i - 1][j];

                    }

                }

            }

        }
 
        return dp[rows][columns];

    }
 
    public static void Main(string[] args)

    {

        Console.WriteLine(IsMatch("aab", "a.*"));

    }
}

Javascript

var isMatch = function (s, p) {

    const rows = s.length;

    const columns = p.length;

    /// Base conditions

    if (rows == 0 && columns == 0) {

        return true;

    }

    if (columns == 0) {

        return false;

    }

    // DP array

    const dp = Array.from({ length: s.length + 1 }, () => [false]);

    // Empty string and empty pattern are a match

    dp[0][0] = true;

    // Deals with patterns with *

    for (let i = 1; i < columns + 1; i++) {

        if (p[i - 1] === '*') {

            dp[0][i] = dp[0][i - 2];

        }

        else {

            dp[0][i] = false;

        };

    }

    // For remaining characters

    for (let i = 1; i < rows + 1; i++) {

        for (let j = 1; j < columns + 1; j++) {

            if (p[j - 1] === '*') {

                if (p[j - 2] === s[i - 1] || p[j - 2] === '.') {

                    dp[i][j] = dp[i][j - 2] || dp[i - 1][j];

                } else {

                    dp[i][j] = dp[i][j - 2];

                }

            } else if (p[j - 1] === s[i - 1] || p[j - 1] === '.') {

                dp[i][j] = dp[i - 1][j - 1];

            } else {

                dp[i][j] = false;

            }

        }

    }

    return dp[rows][columns];
};
 
console.log(isMatch("aa", "a"));
console.log(isMatch("aa", "a*"));
console.log(isMatch("an", "."));
console.log(isMatch("aab", "c*a*b"));
console.log(isMatch("mississippi", "mis*is*p*."));
console.log(isMatch("", ".*"));
console.log(isMatch("ab", ".*c"));

Output

true

Time Complexity: O(m×n)
Auxiliary Space: O(m×n)

Article Tags :

DSA

Dynamic Programming

Facebook

regular-expression