Perl | Anchors in Regex
Last Updated :
06 Jan, 2023
Anchors in Perl Regex do not match any character at all. Instead, they match a particular position as before, after, or between the characters. These are used to check not the string but its positional boundaries.
Following are the respective anchors in Perl Regex:
'^' '$', '\b', '\A', '\Z', '\z', '\G', '\p{....}', '\P{....}', '[:class:]'
^ or \A: It matches the pattern at the beginning of the string.
Syntax: (/^pattern/, /\Apattern/).
Example:
$str = "guardians of the galaxy" ;
print "$&\n" if ( $str =~ /^guardians/);
print "$&\n" if ( $str =~ /\Agua/);
print "$&" if ( $str =~ /^ans/)
|
$ or \z: It matches the pattern at the end of the string.
Syntax: (/pattern$/, /pattern\z/).
Example:
$str = "guardians of the galaxy" ;
print "$&\n" if ( $str =~ /guardians$/);
print "$&\n" if ( $str =~ /y\z/);
print "$&" if ( $str =~ /galaxy$/)
|
\b: It matches at the word boundary of the string from \w to \W. In precise, it either gets a match to beginning or end of the string if it is a word or to a word character or a non-word character.
Syntax: (/\bpattern\b/).
Example:
$str = "guardians-of-the-galaxy" ;
print "$&\n" if ( $str =~ /\b-galaxy\b/);
print "$&\n" if ( $str =~ /\bguardians-\b/);
print "$&" if ( $str =~ /\be-galaxy\b/);
print "$&" if ( $str =~ /\bguardians-of-the-galaxy\b/);
|
Output:
-galaxy
guardians-
guardians-of-the-galaxy
\Z: It matches at the ending of the string or before the newline. ‘\z‘ and ‘\Z‘ both differ from $ in that they are not affected by the /m “multiline” flag, which allows $ to match at the end of any line.
print "one\n" if ( 'galaxy' =~ m/galaxy\z/);
print "two\n" if ( 'galaxy' =~ m/galaxy\Z/);
print "three\n" if ( "galaxy\n" =~ m/galaxy\Z/);
print "four\n" if ( "galaxy\n" =~ m/galaxy\n\z/);
print "five\n" if ( "galaxy\n" =~ m/galaxy\n\Z/);
print "six" if ( "galaxy\n" =~ m/galaxy\z/);
|
Output:
one
two
three
four
five
\G: It matches at the specified position. If a pattern’s length is 5 then it starts from the start of the string till 5 positions, if the pattern is valid then it is forced to check the string from 6th position onwards, moves forward in this fashion till pattern not valid or end of the string.
$str = "galaxy8222as" ;
print "one: $& " while ( $str =~ /\G[a-z]{2}/gc);
print "\n" ;
print "two: $& " while ( "1122a44" =~ /\G\d\d/gc);
print "\n" ;
print "three: $& " while ( "galaxy8222as" =~ /\G\w{2}/gc);
print "four: $& " while ( $str =~ /\G[a-z]{2}/gc);
print "\n" ;
print "five: $& " while ( $str =~ /\G\w{2}/gc);
|
Output:
one: ga one: la one: xy
two: 11 two: 22
three: ga three: la three: xy three: 82 three: 22 three: as
five: 82 five: 22 five: as
\p{…} and \P{…}: \p{…} matches Unicode character class like IsLower, IsAlpha, etc. whereas \P{….} is the complement of Unicode character class.
print "$&" while ( "guardians!@#%^*123" =~ /\p{isalpha}/gc);
print "\n" ;
print "$&" while ( "guardians!@#%^&*123" =~ /\p{isalnum}/gc);
print "\n" ;
print "$&" while ( "guardians!@#%^&*123" =~ /\P{L}/gc);
print "\n" ;
print "$&" while ( "guardians!@#%^&*123" =~ /\p{L}/gc);
|
Output:
guardians
guardians123
!@#%^&*123
guardians
[:class:]: POSIX Character Classes like digit, lower, ascii, etc.
Syntax: (/[[:class:]]/)
POSIX character classes are as follows:
alpha, alnum, ascii, blank, cntrl, digit, graph, lower, punct, space, upper, xdigit, word
print "$&" while ( 'guardians!@#%^&*123' =~ /[[:alpha:]]/gc);
print "\n" ;
print "$&" while ( "guardians!@#%^&*123" =~ /[[:alnum:]]/gc);
print "\n" ;
print "$&" while ( "guardians!@#%^&*123" =~ /[[:digit:]]/gc);
print "\n" ;
print "$&" while ( "guardians!@#%^& 123\n" =~ /[[:graph:]]/gc);
print "\n" ;
print "1" while ( "guardians!@#%^& 123\n" =~ /[[:blank:]]/gc);
print "\n" ;
print "$&" while ( "Guardians!@#%^& 123\n" =~ /[[:lower:]]/gc);
print "\n" ;
print "$&" while ( "guardians!@#%^& 123\n" =~ /[[:ascii:]]/gc);
|
Output:
guardians
guardians123
123
guardians!@#%^&123
1
guardians
guardians!@#%^& 123
Share your thoughts in the comments
Please Login to comment...