Perl – Use of Capturing in Regular Expressions

A regular expression or a regex is a string of characters that define the pattern that we are viewing. It is a special string describing a search pattern present inside a given text.
Perl allows us to group portions of these patterns together into a subpattern and also remembers the string matched by those subpatterns. This behaviour is known as Capturing.

It is important to find the matches in the string and that is done by the regular expressions (regex). These matches are more useful when we take them out of the strings for further processing.

Perl makes it really easy for us to extract parts of a string that has matched by using parentheses () around the data in any regular expression. Perl postulates those matches into special variables for each set of capturing parentheses which are $1, $2, $3.

Example:

filter_none

edit
close

play_arrow

link
brightness_4
code

use warnings;
use strict;
  
# Using the localtime() function
# to get the local time
my $time = localtime(); 
  
print $time, "\n";
  
# Using regex to capture time data
# Accessing the captured match 
# using the special variable $1
print ("$1 \n") if($time =~ /(\d\d:\d\d:\d\d)/);

chevron_right


Named Captures

The captures which allow us to capture portions of matches from applying regular expressions and being able to use them later are known as Named Captures. For example: Extracting a phone number from a contact information.



The basic syntax for a numbered capture is :

(?<capture name> …)

The parenthesis are used to enclose the capture. The ?< name > construct is used to follow the opening parenthesis immediately and provide a name for that particular capture. The remainder of the capture that lefts out is a regular expression.

When there’s a success of matches against the enclosing pattern, Perl updates the magical variable ‘%+‘. This hash contains the name of the capture as the key and the portion of the string that matched the capture as the value of hash. 
Named captures often improve regex maintainability. Even though they are possible in Perl, but they are not used very frequently. They are used only in top-level regexes.

Example:

filter_none

edit
close

play_arrow

link
brightness_4
code

# Extracting the parts of string
$_ = "The brown fox jumps over the lazy dog";
/the (\S+)(?{ $color = $^N }) (\S+)(?{ $animal = $^N })/i;
  
# Printing the matches
print "color = $color, animal = $animal\n"

chevron_right


Output:

color = brown, animal = fox

Numbered Captures

Numbered captures neither provide any identifying name nor does anything to %+. Instead in Perl, the captured string is stored inside a series of magical variables. The first matching capture is stored into $1, the second one in $2, and so on. Capturing count starts at the opening parenthesis of the capture. Thus making the first left parenthesis to capture into $1, the second one in $2 and so on.

The syntax for named captures is longer than that of numbered captures and it also provides extra clarity too. Regex maintainability is less for numbered captures. Numbered captures are useful in simple substitutions where the named capture would unnecessary take more amount of code.

Example:

filter_none

edit
close

play_arrow

link
brightness_4
code

# Extracting forename and surname
print "Please enter your name \n "
chop ($name = <'Vishal Raina '>);
  
if ($name =~ /^\s*(\S+)\s+(\S+)\s*$/) 
{
    print "Hi $1. Your Surname is $2.";
else 
{
    print " Error";
}
print "\n";

chevron_right


Output

Please enter your name 
Hi Vishal. Your Surname is Raina.



My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.


Article Tags :
Practice Tags :


6


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.