Perl – Use of Capturing in Regular Expressions

Last Updated : 01 Aug, 2020

A regular expression or a regex is a string of characters that define the pattern that we are viewing. It is a special string describing a search pattern present inside a given text.
Perl allows us to group portions of these patterns together into a subpattern and also remembers the string matched by those subpatterns. This behaviour is known as Capturing.

It is important to find the matches in the string and that is done by the regular expressions (regex). These matches are more useful when we take them out of the strings for further processing.

Perl makes it really easy for us to extract parts of a string that has matched by using parentheses () around the data in any regular expression. Perl postulates those matches into special variables for each set of capturing parentheses which are $1, $2, $3.

Example:

use warnings; 
use strict; 
  
# Using the localtime() function 
# to get the local time 
my $time = localtime();  
  
print $time, "\n"; 
  
# Using regex to capture time data 
# Accessing the captured match  
# using the special variable $1 
print ("$1 \n") if($time =~ /(\d\d:\d\d:\d\d)/); 

Named Captures

The captures which allow us to capture portions of matches from applying regular expressions and being able to use them later are known as Named Captures. For example: Extracting a phone number from a contact information.

The basic syntax for a numbered capture is :

(?<capture name> …)

The parenthesis are used to enclose the capture. The ?< name > construct is used to follow the opening parenthesis immediately and provide a name for that particular capture. The remainder of the capture that lefts out is a regular expression.

When there’s a success of matches against the enclosing pattern, Perl updates the magical variable ‘%+‘. This hash contains the name of the capture as the key and the portion of the string that matched the capture as the value of hash.
Named captures often improve regex maintainability. Even though they are possible in Perl, but they are not used very frequently. They are used only in top-level regexes.

Example:

# Extracting the parts of string 
$_ = "The brown fox jumps over the lazy dog"; 
/the (\S+)(?{ $color = $^N }) (\S+)(?{ $animal = $^N })/i; 
  
# Printing the matches 
print "color = $color, animal = $animal\n";  

Output:

color = brown, animal = fox

Numbered Captures

Numbered captures neither provide any identifying name nor does anything to %+. Instead in Perl, the captured string is stored inside a series of magical variables. The first matching capture is stored into $1, the second one in $2, and so on. Capturing count starts at the opening parenthesis of the capture. Thus making the first left parenthesis to capture into $1, the second one in $2 and so on.

The syntax for named captures is longer than that of numbered captures and it also provides extra clarity too. Regex maintainability is less for numbered captures. Numbered captures are useful in simple substitutions where the named capture would unnecessary take more amount of code.

Example:

# Extracting forename and surname 
print "Please enter your name \n ";  
chop ($name = <'Vishal Raina '>); 
  
if ($name =~ /^\s*(\S+)\s+(\S+)\s*$/)  
{ 
    print "Hi $1. Your Surname is $2."; 
}  
else 
{ 
    print " Error"; 
} 
print "\n"; 

Output

Please enter your name 
Hi Vishal. Your Surname is Raina.

Suggest improvement

Perl | Backtracking in Regular Expression

Share your thoughts in the comments

Perl – Use of Capturing in Regular Expressions

Named Captures

Numbered Captures

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?