Regular Expressions, Regular Grammar and Regular Languages

Last Updated : 19 Jul, 2023

As discussed in

Chomsky Hierarchy

, Regular Languages are the most restricted types of languages and are accepted by finite automata.

Regular Expressions

Regular Expressions are used to denote regular languages. An expression is regular if:

? is a regular expression for regular language ?.
? is a regular expression for regular language {?}.
If a ? ? (? represents the input alphabet), a is regular expression with language {a}.
If a and b are regular expression, a + b is also a regular expression with language {a,b}.
If a and b are regular expression, ab (concatenation of a and b) is also regular.
If a is regular expression, a* (0 or more times a) is also regular.

	Regular Expression	Regular Languages
set of vowels	( a ? e ? i ? o ? u )	{a, e, i, o, u}
a followed by 0 or more b	(a.b ^* )	{a, ab, abb, abbb, abbbb,….}
any no. of vowels followed by any no. of consonants	v ^* .c ^* ( where v – vowels and c – consonants)	{ ? , a ,aou, aiou, b, abcd…..} where ? represent empty string (in case 0 vowels and o consonants )

Regular Expression

Regular Languages

set of vowels

( a ? e ? i ? o ? u )

{a, e, i, o, u}

a followed by 0 or more b

(a.b

)

{a, ab, abb, abbb, abbbb,….}

any no. of vowels followed by any no. of consonants

( where v – vowels and c – consonants)

{ ? , a ,aou, aiou, b, abcd…..} where ? represent empty string (in case 0 vowels and o consonants )

Regular Grammar :
A grammar is regular if it has rules of form A -> a or A -> aB or A -> ? where ? is a special symbol called NULL.
Regular Languages :
A language is regular if it can be expressed in terms of regular expression.
Closure Properties of Regular Languages
Union :
If L1 and If L2 are two regular languages, their union L1 ? L2 will also be regular. For example, L1 = {a
ⁿ
| n ? 0} and L2 = {b
ⁿ
| n ? 0} L3 = L1 ? L2 = {a
ⁿ
? b
ⁿ
| n ? 0} is also regular.
Intersection :
If L1 and If L2 are two regular languages, their intersection L1 ? L2 will also be regular. For example, L1= {a
^m
b
ⁿ
| n ? 0 and m ? 0} and L2= {a
^m
b
ⁿ
? b
ⁿ
a
^m
| n ? 0 and m ? 0} L3 = L1 ? L2 = {a
^m
b
ⁿ
| n ? 0 and m ? 0} is also regular.
Concatenation :
If L1 and If L2 are two regular languages, their concatenation L1.L2 will also be regular. For example, L1 = {a
ⁿ
| n ? 0} and L2 = {b
ⁿ
| n ? 0} L3 = L1.L2 = {a
^m
. b
ⁿ
| m ? 0 and n ? 0} is also regular.
Kleene Closure :
If L1 is a regular language, its Kleene closure L1* will also be regular. For example, L1 = (a ? b) L1* = (a ? b)*
Complement :
If L(G) is regular language, its complement L’(G) will also be regular. Complement of a language can be found by subtracting strings which are in L(G) from all possible strings. For example, L(G) = {a
ⁿ
| n > 3} L’(G) = {a
ⁿ
| n <= 3}
Note :
Two regular expressions are equivalent if languages generated by them are same. For example, (a+b*)* and (a+b)* generate same language. Every string which is generated by (a+b*)* is also generated by (a+b)* and vice versa.
How to solve problems on regular expression and regular languages?
Question 1 :
Which one of the following languages over the alphabet {0,1} is described by the regular expression? (0+1)*0(0+1)*0(0+1)* (A) The set of all strings containing the substring 00. (B) The set of all strings containing at most two 0’s. (C) The set of all strings containing at least two 0’s. (D) The set of all strings that begin and end with either 0 or 1.
Solution :
Option A says that it must have substring 00. But 10101 is also a part of language but it does not contain 00 as substring. So it is not correct option. Option B says that it can have maximum two 0’s but 00000 is also a part of language. So it is not correct option. Option C says that it must contain atleast two 0. In regular expression, two 0 are present. So this is correct option. Option D says that it contains all strings that begin and end with either 0 or 1. But it can generate strings which start with 0 and end with 1 or vice versa as well. So it is not correct.
Question 2 :
Which of the following languages is generated by given grammar? S -> aS | bS | ? (A) {a
ⁿ
b
^m
| n,m ? 0} (B) {w ? {a,b}* | w has equal number of a’s and b’s} (C) {a
ⁿ
| n ? 0} ? {b
ⁿ
| n ? 0} ? {a
ⁿ
b
ⁿ
| n ? 0} (D) {a,b}*
Solution :
Option (A) says that it will have 0 or more a followed by 0 or more b. But S -> bS => baS => ba is also a part of language. So (A) is not correct. Option (B) says that it will have equal no. of a’s and b’s. But But S -> bS => b is also a part of language. So (B) is not correct. Option (C) says either it will have 0 or more a’s or 0 or more b’s or a’s followed by b’s. But as shown in option (A), ba is also part of language. So (C) is not correct. Option (D) says it can have any number of a’s and any numbers of b’s in any order. So (D) is correct.
Question 3 :
The regular expression 0*(10*)* denotes the same set as (A) (1*0)*1* (B) 0 + (0 + 10)* (C) (0 + 1)* 10(0 + 1)* (D) none of these
Solution :
Two regular expressions are equivalent if languages generated by them are same. Option (A) can generate all strings generated by 0*(10*)*. So they are equivalent. Option (B) string null can not generated by given languages but 0*(10*)* can. So they are not equivalent. Option (C) will have 10 as substring but 0*(10*)* may or may not. So they are not equivalent.
Question 4 :
The regular expression for the language having input alphabets a and b, in which two a’s do not come together: (A) (b + ab)* + (b +ab)*a (B) a(b + ba)* + (b + ba)* (C) both options (A) and (B) (D) none of the above
Solution:
Option (C) stating both both options (A) and (B) is the correct regular expression for the stated question. The language in the question can be expressed as L={&epsilon,a,b,bb,ab,aba,ba,bab,baba,abab,…}. In option (A) ‘ab’ is considered the building block for finding out the required regular expression.(b + ab)* covers all cases of strings generated ending with ‘b’.(b + ab)*a covers all cases of strings generated ending with a. Applying similar logic for option (B) we can see that the regular expression is derived considering ‘ba’ as the building block and it covers all cases of strings starting with a and starting with b. This article has been contributed by Sonal Tuteja.

Suggest improvement

Hypothesis (language regularity) and algorithm (L-graph to NFA) in TOC

How to identify if a language is regular or not

Share your thoughts in the comments

Automata _ Introduction

Regular Expression and Finite Automata

CFG

PDA (Pushdown Automata)

Turing Machine

Decidability

TOC Interview preparation

TOC Quiz and PYQ's in TOC

Regular Expressions, Regular Grammar and Regular Languages

How to solve problems on regular expression and regular languages?

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?