Anchors in Perl Regex do not match any character at all. Instead, they match a particular position as before, after, or between the characters. These are used to check not the string but its positional boundaries.
Following are the respective anchors in Perl Regex:
'^' '$', '\b', '\A', '\Z', '\z', '\G', '\p{....}', '\P{....}', '[:class:]'
^ or \A: It matches the pattern at the beginning of the string.
Syntax: (/^pattern/, /\Apattern/).
Example:
#!/usr/bin/perl $str = "guardians of the galaxy" ; # prints the pattern as it is # starting with 'guardians' print "$&\n" if ( $str =~ /^guardians/); # prints the pattern 'gua' print "$&\n" if ( $str =~ /\Agua/); # prints nothing because # the 0th position doesn't start with 'a' print "$&" if ( $str =~ /^ans/) |
guardians gua
$ or \z: It matches the pattern at the end of the string.
Syntax: (/pattern$/, /pattern\z/).
Example:
#!/usr/bin/perl $str = "guardians of the galaxy" ; # prints nothing as it is not # ending with 'guardians' print "$&\n" if ( $str =~ /guardians$/); # prints the pattern 'y' print "$&\n" if ( $str =~ /y\z/); # prints the pattern as it is # ending with 'galaxy' print "$&" if ( $str =~ /galaxy$/) |
y galaxy
\b: It matches at the word boundary of the string from \w to \W. In precise, it either gets a match to beginning or end of the string if it is a word or to a word character or a non-word character.
Syntax: (/\bpattern\b/).
Example:
#!/usr/bin/perl $str = "guardians-of-the-galaxy" ; # prints '-galaxy' as it forms # a word even with '-'. print "$&\n" if ( $str =~ /\b-galaxy\b/); # prints '-guardians' as it forms # a word even with '-'. print "$&\n" if ( $str =~ /\bguardians-\b/); # prints nothing as it is bounded # with a character 't'. print "$&" if ( $str =~ /\be-galaxy\b/); # prints 'guardians-of-the-galaxy' as it # is bounded with the beginning and end. print "$&" if ( $str =~ /\bguardians-of-the-galaxy\b/); |
-galaxy guardians- guardians-of-the-galaxy
\Z: It matches at the ending of the string or before the newline. ‘\z‘ and ‘\Z‘ both differ from $ in that they are not affected by the /m “multiline” flag, which allows $ to match at the end of any line.
#!/usr/bin/perl # Prints one due to m// print "one\n" if ( 'galaxy' =~ m/galaxy\z/); # Prints two due to m// print "two\n" if ( 'galaxy' =~ m/galaxy\Z/); # Prints three due to /Z # as it forms a newline print "three\n" if ( "galaxy\n" =~ m/galaxy\Z/); # Prints four due to m// as # the line ended \z gets affected print "four\n" if ( "galaxy\n" =~ m/galaxy\n\z/); # Prints five as it forms a new line print "five\n" if ( "galaxy\n" =~ m/galaxy\n\Z/); # Due to the "" it forms a newline and # \z doesn't get effected. Prints nothing print "six" if ( "galaxy\n" =~ m/galaxy\z/); |
one two three four five
\G: It matches at the specified position. If a pattern’s length is 5 then it starts from the start of the string till 5 positions, if the pattern is valid then it is forced to check the string from 6th position onwards, moves forward in this fashion till pattern not valid or end of the string.
#!/usr/bin/perl $str = "galaxy8222as" ; # prints until the pattern is valid print "one: $& " while ( $str =~ /\G[a-z]{2}/gc); print "\n" ; # prints until the pattern is valid print "two: $& " while ( "1122a44" =~ /\G\d\d/gc); print "\n" ; # Take the string as a new value and # searches from the start to false print "three: $& " while ( "galaxy8222as" =~ /\G\w{2}/gc); print "four: $& " while ( $str =~ /\G[a-z]{2}/gc); # Take the false position of the # above string and searches from there # Prints if the pattern is valid from that position # onwards(prints nothing). As it is false # it stays at the same position as before. print "\n" ; print "five: $& " while ( $str =~ /\G\w{2}/gc); |
one: ga one: la one: xy two: 11 two: 22 three: ga three: la three: xy three: 82 three: 22 three: as five: 82 five: 22 five: as
\p{…} and \P{…}: \p{…} matches Unicode character class like IsLower, IsAlpha, etc. whereas \P{….} is the complement of Unicode character class.
#!/usr/bin/perl # unicode class is the pattern to match print "$&" while ( "guardians!@#%^*123" =~ /\p{isalpha}/gc); print "\n" ; # unicode class is the pattern to match print "$&" while ( "guardians!@#%^&*123" =~ /\p{isalnum}/gc); print "\n" ; # here L matches the alphabets where \P is the complement print "$&" while ( "guardians!@#%^&*123" =~ /\P{L}/gc); print "\n" ; # here L matches the alphabets where \p is non-complement print "$&" while ( "guardians!@#%^&*123" =~ /\p{L}/gc); |
guardians guardians123 !@#%^&*123 guardians
[:class:]: POSIX Character Classes like digit, lower, ascii, etc.
Syntax: (/[[:class:]]/)
POSIX character classes are as follows:
alpha, alnum, ascii, blank, cntrl, digit, graph, lower, punct, space, upper, xdigit, word
#!/usr/bin/perl # prints only alphabets print "$&" while ( 'guardians!@#%^&*123' =~ /[[:alpha:]]/gc); print "\n" ; # prints characters and digits print "$&" while ( "guardians!@#%^&*123" =~ /[[:alnum:]]/gc); print "\n" ; # prints only digits print "$&" while ( "guardians!@#%^&*123" =~ /[[:digit:]]/gc); print "\n" ; # prints anything except space " ". print "$&" while ( "guardians!@#%^& 123\n" =~ /[[:graph:]]/gc); print "\n" ; # prints the 1 as it gets matched to # space " " or horizontal tab. print "1" while ( "guardians!@#%^& 123\n" =~ /[[:blank:]]/gc); print "\n" ; # prints lowercase characters print "$&" while ( "Guardians!@#%^& 123\n" =~ /[[:lower:]]/gc); print "\n" ; # prints all ascii characters print "$&" while ( "guardians!@#%^& 123\n" =~ /[[:ascii:]]/gc); |
guardians guardians123 123 guardians!@#%^&123 1 uardians guardians!@#%^& 123