How to check a valid regex string using Python?
A Regex (Regular Expression) is a sequence of characters used for defining a pattern. This pattern could be used for searching, replacing and other operations. Regex is extensively utilized in applications that require input validation, Password validation, Pattern Recognition, search and replace utilities (found in word processors) etc. This is due to the fact that regex syntax stays the same across different programming languages and implementations. Therefore, one having the grasp of it provides longevity across languages. In this article, we will be creating a program for checking the validity of a regex string.
The method we would be using will require a firm understanding of the try-except construct of python. Therefore, it would be wise if we touch upon that before moving over to the actual code.
Try except block is used to catch and handle exceptions encountered during the execution of a particular block of code (construct exists in other programming languages under the name try-catch). The general syntax of a try-except block is as follows:
# Execute this code
# Execute this code, if an exception arises during the execution of the try block
In the above syntax, any code found within the try block would be executed. If an exception/error arises during the execution of the try block then (only) the except block is executed. If the try block executes without producing an exception, then the except block won’t be executed. If a bare except clause is used, then it would catch any exception (and certain even System_Exits) encountered during the execution of try block. To prevent such from happening, it is generally a good practice to specify an exception after the except. This ensures that only after encountering that specific exception/error, the except block will execute. This prevents concealment of other errors encountered during the execution of the try block. Also, multiple except clauses can be used within the same try-except block, this enables it to trap a plethora of exceptions, and deal with them specifically. This construct contains other keywords as well such as finally, else etc. which aren’t required in current context. Therefore, only the relevant sections are described.
In the following code, we would be specifying re.error as the exception in the except clause of the try-except block. This error is encountered when an invalid regex pattern is found, during the compilation of the pattern.
Non valid regex pattern
Firstly we imported the re library, for enabling regex functionality in our code. Then we assigned a string containing the regex pattern to the variable pattern. The pattern provided is invalid as it contains an unclosed character class (in regex square brackets `[ ]`are used for defining a character class). We placed the re.compile() (used to compile regex patterns) function within the try block. This will firstly try to compile the pattern and if any exception occurs during the compilation, it would firstly check whether it is re.error, if it is then only the except block will execute. Otherwise, the exception will be displayed in the traceback, leading to program termination. The except block contains print statements that outputs the user defined message to the stdout and then exits the program (via exit()). Since the pattern provided is invalid (explained earlier) this lead to the except block getting executed.
The above code only deals with re.error exception. But other exceptions related to regex also exist such as RecursionError, Which needs to be dealt with specifically(by adding a separate except clause for that exception as well or changing the maximum stack depth using sys.setrecursionlimit() as of this case).
Checking whether the input string matches the Regex pattern
In the following example, we will test whether an input string matches a given regex pattern or not. This is assuming the regex pattern is a valid one (could be ensured using the aforementioned example). We would be checking whether the input string is a alphanumeric string (one containing alphabets or digits throughout its length) or not. We would be using the following class for checking the string:
Even though there exists a special sequence in regex (\w)for finding alphanumeric characters. But we won’t be using it as it contains the underscore ( _ ) in its character class (A-Za-z0-9_), which isn’t considered as an alphanumeric character under most standards.
> Enter the string: DeepLearnedSupaSampling01 'DeepLearnedSupaSampling01' is an alphanumeric string!
> Enter the string: afore 89df 'afore 89df' is NOT a alphanumeric string!
Firstly, we compiled the regex pattern using re.compile(). The Regex pattern contains a character set which is used to specify that all alphabets (lowercase and uppercase) and digits (0-9) are to be included in the pattern. Following the class is the plus sign ( + ) which is a repetition qualifier. This allows the resulting Regular Expression search to match 1 or more repetitions of the preceding Regular Expression (for us which is the alphanumeric character set). Then we prompt the user for the input string. After which we passed the input string and compiled regex pattern to re.fullmatch(). This method checks if the whole string matches the regular expression pattern or not. If it does then it returns 1, otherwise a 0. Which we used inside the if-else construct to display the success/failure message accordingly.