Perl – Use of Capturing in Regular Expressions
A regular expression or a regex is a string of characters that define the pattern that we are viewing. It is a special string describing a search pattern present inside a given text.
Perl allows us to group portions of these patterns together into a subpattern and also remembers the string matched by those subpatterns. This behaviour is known as Capturing.
It is important to find the matches in the string and that is done by the regular expressions (regex). These matches are more useful when we take them out of the strings for further processing.
Perl makes it really easy for us to extract parts of a string that has matched by using parentheses () around the data in any regular expression. Perl postulates those matches into special variables for each set of capturing parentheses which are $1, $2, $3.
The captures which allow us to capture portions of matches from applying regular expressions and being able to use them later are known as Named Captures. For example: Extracting a phone number from a contact information.
The basic syntax for a numbered capture is :
(?<capture name> …)
The parenthesis are used to enclose the capture. The ?< name > construct is used to follow the opening parenthesis immediately and provide a name for that particular capture. The remainder of the capture that lefts out is a regular expression.
When there’s a success of matches against the enclosing pattern, Perl updates the magical variable ‘%+‘. This hash contains the name of the capture as the key and the portion of the string that matched the capture as the value of hash.
Named captures often improve regex maintainability. Even though they are possible in Perl, but they are not used very frequently. They are used only in top-level regexes.
color = brown, animal = fox
Numbered captures neither provide any identifying name nor does anything to %+. Instead in Perl, the captured string is stored inside a series of magical variables. The first matching capture is stored into $1, the second one in $2, and so on. Capturing count starts at the opening parenthesis of the capture. Thus making the first left parenthesis to capture into $1, the second one in $2 and so on.
The syntax for named captures is longer than that of numbered captures and it also provides extra clarity too. Regex maintainability is less for numbered captures. Numbered captures are useful in simple substitutions where the named capture would unnecessary take more amount of code.
Please enter your name Hi Vishal. Your Surname is Raina.