Open In App

Implementing Regular Expression Matching

Last Updated : 21 Oct, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

Given two strings S and P where S consists of only lowercase English alphabets while P consists of lowercase English alphabets as well as special characters ‘.’ and ‘*’, the task is to implement a function to test regular expression such that:

  • '.' Matches any single character.​​​​
  • '*' Matches zero or more of the preceding element.

Note: For each appearance of the character ‘*', there will be a previous valid character to match.

Examples:

Input: s = "aaa", p = "a"
Output: false
Explanation: "a" does not match the entire string "aaa".

Input: s = "abb", p = "a.*"
Output: true
Explanation: replace . with b then p becomes ab* now replace * with one preceeding character hence p becomes abb.

Approach: To solve the problem follow the below observations:

  • Suppose: s = “aa” and p = “aa”, since all the characters in both s and p are the same, it’s a match.
  • Suppose:  s = “aabb” and p = “aab*”, We know that substrings bb and b* match because * can be replaced by one b. Since we already know that the remaining substrings “aa” and “aa” are matched, the whole strings are also a match.

What can we infer from this? Right, if we have a solution of part of a problem, we can use that partial result and can go forward. Also, we can use the already calculated result without calculating it again.

Does this ring a bell ????? Yes, this problem satisfies the following two properties –

  • Optimal Substructure — Any problem has optimal substructure property if its overall optimal solution can be constructed from the optimal solutions of its subproblems.
  • Overlapping Subproblems — Any problem has overlapping sub-problems if finding its solution involves solving the same subproblem multiple times.

It is now evident that we can use Dynamic Programming to solve this problem. Below are the steps —

  • Create a boolean 2D dp array with sizes as boolean[][] dp = new boolean[s.length() + 1][p.length() + 1]. We are adding extra 1 to incorporate the case in case either or both of the strings are empty.
  • If both strings are empty, then it’s a match, thus, dp[0][0] = true.
  • Let’s take an example s = "aab" and p = "c*a*b" and create a DP table.

c

*

a

*

b

0

1

2

3

4

5

0

TRUE

FALSE

TRUE

FALSE

TRUE

FALSE

a

1

FALSE

FALSE

FALSE

TRUE

TRUE

FALSE

a

2

FALSE

FALSE

FALSE

FALSE

TRUE

FALSE

b

3

FALSE

FALSE

FALSE

FALSE

FALSE

TRUE

  • First column — it means p is empty and it will match to s only if s is also empty which we have stored in dp[0][0]. Thus, remaining values of dp[0][i] will be false.
  • First row — this is not so easy. It means which p matches empty s. The answer is either an empty pattern or a pattern that represents an empty string such as "a*""x*y*""l*m*n*" and so on. In the above example, if s = "" and p = "c*", then due to *c can be replaced by 0 cs which gives us an empty string. Hence, dp[0][2] = true.
  • For non-empty strings, let’s say that s[i - 1] == p[j - 1] this means the (i – 1)th and (j – 1)th characters are same. This means, we have to check if the remaining strings are a match or not. If they are a match, then the current substrings will be a match, otherwise they won’t be a match i.e., dp[i][j] = dp[i - 1][j - 1]. We’re taking (i – 1)th and (j – 1)th characters to offset empty strings as we’re assuming our strings start from index 1.
  • If p[j - 1] == ".", then it means any single character can be matched. Therefore, here also, we will have to check if the remaining string is a match or not. Thus, dp[i][j] = dp[i - 1][j - 1].
  • If p[j - 1] == "*", then it means either it’s represents an empty string (0 characters), thus dp[i][j] = dp[i][j - 2] or s[i - 1] == p[j - 2] || p[j - 2] == ".", then current character of string equals the char preceding *' in pattern so the result is dp[i-1][j].

Below is the implementation of above approach:

C++




#include <iostream>
#include <vector>
using namespace std;
 
bool isMatch(string s, string p) {
    int rows = s.length();
    int columns = p.length();
 
    // Base conditions
    if (rows == 0 && columns == 0) {
        return true;
    }
    if (columns == 0) {
        return false;
    }
 
    // DP array
    vector<vector<bool>> dp(rows + 1, vector<bool>(columns + 1, false));
 
    // Empty string and empty pattern are a match
    dp[0][0] = true;
 
    // Deals with patterns with *
    for (int i = 2; i <= columns; i++) {
        if (p[i - 1] == '*') {
            dp[0][i] = dp[0][i - 2];
        }
    }
 
    // For remaining characters
    for (int i = 1; i <= rows; i++) {
        for (int j = 1; j <= columns; j++) {
            if (s[i - 1] == p[j - 1] || p[j - 1] == '.') {
                dp[i][j] = dp[i - 1][j - 1];
            } else if (j > 1 && p[j - 1] == '*') {
                dp[i][j] = dp[i][j - 2];
                if (p[j - 2] == '.' || p[j - 2] == s[i - 1]) {
                    dp[i][j] = dp[i][j] || dp[i - 1][j];
                }
            }
        }
    }
    return dp[rows][columns];
}
 
int main() {
    cout << boolalpha << isMatch("aab", "a.*") << endl;
    return 0;
}


Java




// Java Code for above approach
import java.io.*;
import java.util.Scanner;
public class RegularExpressionMatching {
 
    public static boolean isMatch(String s, String p)
    {
        int rows = s.length();
        int columns = p.length();
 
        /// Base conditions
        if (rows == 0 && columns == 0) {
            return true;
        }
        if (columns == 0) {
            return false;
        }
 
        // DP array
        boolean[][] dp = new boolean[rows + 1][columns + 1];
 
        // Empty string and empty pattern
        // are a match
        dp[0][0] = true;
 
        // Deals with patterns with *
        for (int i = 2; i < columns + 1; i++) {
            if (p.charAt(i - 1) == '*') {
                dp[0][i] = dp[0][i - 2];
            }
        }
 
        // For remaining characters
        for (int i = 1; i < rows + 1; i++) {
            for (int j = 1; j < columns + 1; j++) {
                if (s.charAt(i - 1) == p.charAt(j - 1)
                    || p.charAt(j - 1) == '.') {
                    dp[i][j] = dp[i - 1][j - 1];
                }
                else if (j > 1 && p.charAt(j - 1) == '*') {
                    dp[i][j] = dp[i][j - 2];
                    if (p.charAt(j - 2) == '.'
                        || p.charAt(j - 2)
                               == s.charAt(i - 1)) {
                        dp[i][j] = dp[i][j] | dp[i - 1][j];
                    }
                }
            }
        }
        return dp[rows][columns];
    }
 
    public static void main(String[] args)
    {
        System.out.println(isMatch("aab", "a.*"));
    }
}


Python3




def isMatch(s: str, p: str) -> bool:
    rows, columns = (len(s), len(p))
    # Base conditions
    if rows == 0 and columns == 0:
        return True
    if columns == 0:
        return False
    # DP array
    dp = [[False for j in range(columns + 1)] for i in range(rows + 1)]
    # Since empty string and empty pattern are a match
    dp[0][0] = True
    # Deals with patterns containing *
    for i in range(2, columns + 1):
        if p[i - 1] == '*':
            dp[0][i] = dp[0][i - 2]
    # For remaining characters
    for i in range(1, rows + 1):
        for j in range(1, columns + 1):
            if s[i - 1] == p[j - 1] or p[j - 1] == '.':
                dp[i][j] = dp[i - 1][j - 1]
            elif j > 1 and p[j - 1] == '*':
                dp[i][j] = dp[i][j - 2]
                if p[j - 2] == '.' or p[j - 2] == s[i - 1]:
                    dp[i][j] = dp[i][j] or dp[i - 1][j]
    return dp[rows][columns]
   
if __name__ == '__main__':
    print(isMatch("aa", "a"))
    print(isMatch("aa", "a*"))
    print(isMatch("ab", "."))
    print(isMatch("aab", "c*a*b"))
    print(isMatch("mississippi", "mis*is*p*."))
    print(isMatch("", ".*"))


C#




using System;
 
public class RegularExpressionMatching
{
    public static bool IsMatch(string s, string p)
    {
        int rows = s.Length;
        int columns = p.Length;
 
        // Base conditions
        if (rows == 0 && columns == 0)
        {
            return true;
        }
        if (columns == 0)
        {
            return false;
        }
 
        // DP array
        bool[][] dp = new bool[rows + 1][];
        for (int i = 0; i <= rows; i++)
        {
            dp[i] = new bool[columns + 1];
        }
 
        // Empty string and empty pattern are a match
        dp[0][0] = true;
 
        // Deals with patterns with *
        for (int i = 2; i <= columns; i++)
        {
            if (p[i - 1] == '*')
            {
                dp[0][i] = dp[0][i - 2];
            }
        }
 
        // For remaining characters
        for (int i = 1; i <= rows; i++)
        {
            for (int j = 1; j <= columns; j++)
            {
                if (s[i - 1] == p[j - 1] || p[j - 1] == '.')
                {
                    dp[i][j] = dp[i - 1][j - 1];
                }
                else if (j > 1 && p[j - 1] == '*')
                {
                    dp[i][j] = dp[i][j - 2];
                    if (p[j - 2] == '.' || p[j - 2] == s[i - 1])
                    {
                        dp[i][j] = dp[i][j] || dp[i - 1][j];
                    }
                }
            }
        }
 
        return dp[rows][columns];
    }
 
    public static void Main(string[] args)
    {
        Console.WriteLine(IsMatch("aab", "a.*"));
    }
}


Javascript




var isMatch = function (s, p) {
    const rows = s.length;
    const columns = p.length;
    /// Base conditions
    if (rows == 0 && columns == 0) {
        return true;
    }
    if (columns == 0) {
        return false;
    }
    // DP array
    const dp = Array.from({ length: s.length + 1 }, () => [false]);
    // Empty string and empty pattern are a match
    dp[0][0] = true;
    // Deals with patterns with *
    for (let i = 1; i < columns + 1; i++) {
        if (p[i - 1] === '*') {
            dp[0][i] = dp[0][i - 2];
        }
        else {
            dp[0][i] = false;
        };
    }
    // For remaining characters
    for (let i = 1; i < rows + 1; i++) {
        for (let j = 1; j < columns + 1; j++) {
            if (p[j - 1] === '*') {
                if (p[j - 2] === s[i - 1] || p[j - 2] === '.') {
                    dp[i][j] = dp[i][j - 2] || dp[i - 1][j];
                } else {
                    dp[i][j] = dp[i][j - 2];
                }
            } else if (p[j - 1] === s[i - 1] || p[j - 1] === '.') {
                dp[i][j] = dp[i - 1][j - 1];
            } else {
                dp[i][j] = false;
            }
        }
    }
    return dp[rows][columns];
};
 
console.log(isMatch("aa", "a"));
console.log(isMatch("aa", "a*"));
console.log(isMatch("an", "."));
console.log(isMatch("aab", "c*a*b"));
console.log(isMatch("mississippi", "mis*is*p*."));
console.log(isMatch("", ".*"));
console.log(isMatch("ab", ".*c"));


Output

true







Time Complexity: O(m×n)
Auxiliary Space: O(m×n)



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads