Perl | Special Character Classes in Regular Expressions

Last Updated : 27 Sep, 2019

There are many different character classes implemented in Perl and some of them used so frequently that a special sequence is created for them. The aim of creating a special sequence is to make the code more readable and shorter. The Special Character Classes in Perl are as follows:

Digit \d[0-9]: The \d is used to match any digit character and its equivalent to [0-9]. In the regex /\d/ will match a single digit. The \d is standardized to “digit”. The main advantage is that the user can easily write in shorter form and can easily read it. There are two ways to use this special character class. Let’s take an example for better understanding to know how to match the character string.
Example:
```
/#[MNOPQ]-\d\d\d/
```
The above given character string will be match as below.
```
#M-12345
#N-66666
```
Here, we can also make the use of quantifiers by putting that on the character class.

Example:
```
/#[MNOPQ]-\d{5}/
```
The above-given example is same as the previous regex and it allows any number of digits after the dash and it can be written as /#[MNOPQ]-\d+/.

The second method is used in the larger character classes. The \d is put in square bracket and match single character digit.

Example:
```
[\dABCDEFDEFGHIJKLMN]
```
There can be match a single digit or match any of the capital letters A, B, C, D, E, F, G, H, I, J, K, L, M or N. It can be written in shorter form by using dash(-). Then it will be like:
```
[\dA-N]
```

PO SIX character classes: PO SIX are the standards to maintaining the compatibility between operating systems and defines the application programming interface(API), with command line shells and utility interfaces. It also specifies a number of “groups of characters” with a name such as (alpha, alnum, ascii, blank etc). The PO SIX character classes always exists in the form of [:class:] where class is the name and the [: and :] are the delimiters. POSIX character classes always appear inside the bracketed character classes. These classes are a convenient and explanatory way of listing a group of characters.

Syntax:

$string =~ /[[:class:]]/

Here class can be alpha, alnum, ascii etc.

POSIX character classes support larger bracketed character classes as shown below:

[01[:Class:]%]

Here it will match ‘0’, ‘1’ and any Character Classes and the percent sign. Perl provides support for different PO SIX character classes as shown below in table:

Class	Description
alpha	Any alphabetical character (“[A-Za-z]”)
alnum	Any alphanumeric character (“[A-Za-z0-9]”).
ascii	Any character in the ASCII character set.
blank	A space or a horizontal tab
cntrl	Any control character.
digit	Any decimal digit (“[0-9]”).
graph	Any printable character, excluding a space
lower	Any lowercase character (“[a-z]”)
punct	Any graphical character
space	Any whitespace character
upper	Any uppercase character (“[A-Z]”)
xdigit	Any hexadecimal digit (“[0-9a-fA-F]”)
word	A Perl extension (“[A-Za-z0-9_]”), equivalent to “\w”

Word character \w[0-9a-zA-Z_]: The \w belongs to word character class. The \w matches any single alphanumeric character which may be an alphabetic character, or a decimal digit or punctuation character such as underscore(_). It will match only single character word, not the whole word. If you want to match the whole word then use \w+.

Whitespace \s[\t\n\f\r ]: The character class \s will match a single character i.e. a whitespace. It will also match the 5 characters i.e. \t -horizontal tab, \n-the newline, \f-the form feed, \r-the carriage return, and the space. In Perl v5.18, a new character to be introduced which is matches the \cK – vertical tab .

Negated character classes \D, \W, \S : There are more than 110, 000 Unicode characters available in this world. To negate a character class just use caret(^) symbol. It will negate the specified character after the symbol or even a range. In negated character classes we use [^\d] to negate the digits from 0 to 9. But in place of [^\d] we can use simply \D to negate the digits from 0 to 9. Following table illustrate the special negated character classes:

Character Class	Negated	Meaning	Description
\d	\D	[^\d]	matches any non-digit character
\s	\S	[^\s]	matches any non-whitespace character
\w	\W	[^\w]	matches any non-“word” character

Unicode character classes: The Unicode is a definition of “all” the existing characters and the Unicode Standard provides a unique number for each and every character, and it is platform independent. There are more than 100, 000 character available in this world and each character described as a character point. But some of the characters are grouped together.
Syntax:
```
\p{...any character...}
```
This syntax is used to match a single character from one of the groups. If you need to match anything except a specified character then you can use the corresponding \P{…any character…} expression.

Suggest improvement

Function Signature in Perl

Perl | Regex Cheat Sheet

Share your thoughts in the comments

Perl | Special Character Classes in Regular Expressions

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?