How to identify if a language is regular or not



Prerequisite – Regular Expressions, Regular Grammar and Regular Languages, Pumping Lemma
There is a well established theorem to to identify if a language is regular or not, based on Pigeon Hole Principle, called as Pumping Lemma. But pumping lemma is a negativity test, i.e. if a language doesn’t satisfy pumping lemma, then we can definitely say that it is not regular, but if it satisfies, then the language may or may not be regular. Pumping Lemma is more of a mathematical proof, takes more time and it may be confusing sometimes. In exams, we need to address this problem very quickly, so based on common observations and analysis, here are some quick rules that will help:

  1. Every finite set represents a regular language.
    Example – All strings of length = 2 over {a, b}* i.e. L = {aa, ab, ba, bb} is regular.

  2. Given an expression of non-regular language, but the value of parameter is bounded by some constant, then the language is regular.
    Example – L = {a^n b^n | n <= 10^10^10} is regular, because it is upper bounded and thus a finite language.

  3. The pattern of strings form an A.P.(Arithmetic Progression) is regular, but the pattern with G.P.(Geometric Progression) is not regular.
    Example 1 – L = { a^n | n>=2 } is regular. Clearly, it forms an A.P., so we can draw a finite automaton with 3 states.

    Example 2 – L = { a^2^n | n>=1 } is not regular. It forms a G.P., so we cannot have a fixed pattern which repeated would generate all strings of this language.



  4. No pattern is present, that could be repeated to generate all the strings in language is not regular.
    Example – L = { a^p | p is prime. } is not regular.

  5. A concatenation of pattern(regular) and a non-pattern(not regular) is also not a regular language.
    Example – L = { a^n b^{2m} | n>=1, m>=1 } is not regular.

  6. Whenever an unbounded storage is required for storing the count and then comparison with other unbounded count, then the language is not regular.
    Example 1 – L = { a^n b^n | n>=1 } is not regular, because we need to first store the count of a and then compare it against count of b, but finite automaton has a limited storage, but in L, there can be infinite number of a’s coming in, which can’t be stored.

    Example 2 – L = { w | na(w) = nb(w) } is not regular.

    Example 3 – L = { w | na(w)%3 > nb(w)%3 } is regular, because modulus operation requires bounded storage i.e. modulus can be only 0, 1 or 2 for %3 operation and then similarly for b, it would be 0, 1 and 2. So, the states represented in DFA would be all combinations of the remainders of a and b.

  7. Patterns of W, W^r and X, Y
    • If |X| is bounded to certain length, then not regular.
      Example – L = { W X W^r | W, X belongs to {a, b}* and |X|=5 } is not regular. If W= abb then W^r = bba and X = aab, so combined string is abb aab bba. Now, X can be expanded to eat away W and W^r, but |X| = 5. So, even in expanded string, W=ab, Wr=ba and X=baabb. So, we don’t get any pattern of form (a+b)*, so not regular.

    • If |X| is unbounded and W belongs to (a+b)*, then put W as epsilon and W^r as epsilon, if we get (a+b)* as a result then the language is regular.
      Example – L = { W X W^r | W, X belongs to {a, b}* } is regular.
      If W = abb then W^r= bba and X = aab, so combined string is abb aab bba. Now, X can be expanded to eat away W and W^r. So, in expanded string, W=epsilon, W^r=epsilon and X=abb aab bba. We get simple pattern of form (a+b)* for which finite automaton can be drawn.

    • If |X| is unbounded and W belongs to (a+b)+, then put W as any symbol from alphabet and corresponding W^r, if we can still get some combination of (a+b)* as a result, with a guarantee that all strings will be of the same form, then the language is regular else not regular. This needs more explanation with an example:
      Example 1 – L = { W X W^r | W, X belongs to {a, b}+ } is regular.
      If W = abb then W^r = bba and X = aab, so combined string is abbaabbba. Now, X can be expanded to eat away W and W^r, leaving one symbol either a or b. In the expanded string, if W=a then W^r=a and if W=b then W^r=b. In above example, W=W^r=a. X=bbaabbb. It reduces to the pattern strings starting and ending in same symbol a(a+b)*a or b(a+b)*b for which finite automaton can be drawn.

      Example 2 – L = { W W^r Y | W, Y belongs to {0, 1}+} is not regular.
      If W= 101 then W^r= 101 and Y= 0111, then string is 1011010111. If Y is expanded, then W=1, W^r=0 and Y=11010111. According to this expansion, the string does not belong to the language itself, that is wrong, hence not regular.

      If W= 110 then W^r= 011 and Y= 0111, then string is 1100110111. If Y is expanded, then W=1, Wr=1 and Y=00110111. This string satisfies the language, but we can’t generalize based upon this, because strings might start with 01 and 10 as well.

      Example 3 – L = { X W W^r Y | X, Y, W belongs to {a, b}+ } is regular.
      If X= aaaba, W=ab and Y=aab. The string is aaaba abbaaab. Now, X and Y could be expanded such that W and W^r are eaten, but leaving one symbol for + i.e. the new values are X= aaabaa W= b, W^r= b. So, the language gets reduced to set of all strings containing ‘bb’ or ‘aa’ as substring, which is regular.



My Personal Notes arrow_drop_up


If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.



Improved By : Vishwesh Vinchurkar