Open In App

Posix Basic Regular Expressions

Improve
Improve
Like Article
Like
Save
Share
Report

POSIX stands for Portable Operating System Interface. It defines a set of standard operating system interfaces based on the UNIX OS. These standards are specified by the IEEE (Institute of Electrical and Electronics Engineers) society which maintains the compatibility between different operating systems. POSIX systems are theorized in such a way that data or code can be transferred seamlessly between any two systems that are POSIX compliant

POSIX standards define application programming interfaces (API) at both the system and user levels and command-line shells and utility interfaces for software compatibility and/or portability with various distributions of the Unix Operating System and other operating systems as well. The IEEE also owns the trademark of POSIX. Developers are encouraged to use POSIX as it saves time for developers all around the world along with providing easier means for portability.

History of POSIX:

Earlier, computer programmers had to write different programs from ground zero for every single computer model. This was time-consuming and in most cases very difficult as each computer model was somehow different than the other and special care had to be taken to write a program keeping their hardware in mind. After AT&T Bell Labs launched the Unix OS, nothing was the same anymore. Unix quickly became popular and various different distributions like BSD, Xenix, etc. came into the picture. The introduction of newer operating systems like Unix would make it even more difficult for developers to maintain their software among so many devices and now operating systems too. This prompted the establishment of some standards that later came to be known as POSIX. They were first released under the name IEEE Standard 1003.1-1988 in the year 1988. The main aim was to establish some sort of pre-defined rules for future systems to maintain portability regardless of their hardware or manufacturers.

What are Regular Expressions?

Regular Expressions, often shortened as regex, are nothing but character sequences that provide search patterns in the text. They are used for various string-based operations like searching, finding & replacing, validating input, etc. 

Regex finds its use in various search engines, text editors like MS Word, and even in text processing utilities such as AWK or sed. Most general-purpose programming languages like C, C++, Python, JS, Java, etc. support regex.

POSIX Regular Expressions:

One of the standards of POSIX defines two methods for using regular expressions. “grep” and “egrep” is used to implement regular expressions on POSIX systems.

In POSIX Basic Regex syntax, most chars are treated as literals, i.e. they match only themselves (e.g. j will match with “j). However, there are some exceptions, which are called Metacharacters.

Metacharacters Descriptions
. To match any character once. The dot character matches a literal dot, within POSIX bracket expressions. For example, a.c matches “abc”, etc., but [a.c] matches only “a”, “.”, or “c”.
Used to define a range. For Example, [a-d] will match for chars a to d, both inclusive.
[ ] To match anything inside the square brackets. For e.g. [ab] will match a or b.
^ The ^ (caret) within the square brackets negates the expressions. For example, [^a] will match anything except a.
The dollar symbol matches the ending position of the string if it is the last character of the regular expression.
* To match the preceding character 0 or more times. For example, a*d will match and, annnd, aaaaad, etc.
{n} To match the preceding chars n times. Example, [0-9]{3} will match 123, 234, 345, etc.
{n,m} To match preceding char at least n times and not more than m times. Example, [0-9]{3,5} will match 123, 3456, 45668, etc.

Some examples:

1. To match any three-letter string ending with at. For example, cat, hat, etc.

.at

2. To match all strings ending in at except bat.

[^b]at

3. To match hat and cat, but only at the beginning of the string or line.

^[hc]at

4. To match uppercase letters.

[:upper:] (similar to [A-Z])

5. To match lowercase letters.

[:lower:] (similar to [a-z])

6. To match whitespace characters.

[ \t\n\r\f\v]

Last Updated : 24 May, 2022
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads