Parikh’s Theorem

Introduction :
Parikh’s theorem in theoretical computer science says that if one looks only at the number of occurrences of each terminal symbol in a context-free language, without regard to their order, then the language is indistinguishable from a regular language. It is useful for deciding that strings with a given number of terminals are not accepted by context-free grammar. It was first proved by Rohit Parikh in 1961 and republished in 1966.

Theorem :
“Parikhʼs theorem” states that the Parikh image of a context-free language is semi-linear or, equivalently, that every context-free language has the same Parikh image as some regular language. We present a very simple construction that, given context-free grammar, produces a finite automaton recognizing such a regular language.
A strengthened form of the pumping lemma for context-free languages is used to give a simple proof of Parikh’s Theorem.

1. Parikh Image –

If w is a word over some Σ, we denote by Π_∑(w) the Parikh image of w over the alphabet Σ.
Π_∑(w) maps a character in Σ to its number of occurrences in w .
The Parikh image of a language L over Σ is {Π_∑(w)|w ∈ L}. It is denoted by Π_∑(L).

Examples –

Π_{a,b,c}(bccba) = (1, 2, 2) where (1, 2, 2) stands for {(a, 1),(b, 2),(c, 2)}.
Π_{a,b,c}(cabaaabb) = (4, 3, 1).

2. Derivation –
Let ∑={a1,a2,…,ak} be an alphabet. The Parikh vector of a word is defined as the function p: ∑* -> N^k, given by p(w) = (|w|_a1,|w|_a2,…, |w|_k)where |w|_aidenotes the number of occurrences of the letter a_iin the word w.

Statement 1 –
Let L be a context-free language. Let P(L) be the set of Parikh vectors of words in L , that is, P(L)={p(w) | w ∈ L} . Then P(L) is a semi-linear set.
Semi-linear sets = Presburger-definable subsets of N^k.
Two languages are said to be commutatively equivalent if they have the same set of Parikh vectors.
Statement 2 –
If S is any semi-linear set, the language of words whose Parikh vectors are in S is commutatively equivalent to some regular language. Thus, every context-free language is commutatively equivalent to some regular language.
These two equivalent statements can be summed up by saying that the image under p of context-free languages and of regular languages is the same, and it is equal to the set of semi-linear sets.

For Bounded Languages :
A language L is bounded if L is a subset of w1*……..wk* for some fixed words w1,….. , wk . Ginsburg and Spanier gave a necessary and sufficient condition, similar to Parikh’s theorem, for bounded languages.
The Ginsburg-Spanier theorem says that a bounded language L is context-free if and only if {(n1,…….,nk) | w1n1…… wknk ∈ L} is a stratified semi-linear set.

Example –

Context-free language(CFL) L = {0ⁿ1ⁿ | n>=1}
Number of 0’s (N₀) = n
Number of 1’s (N₁) = n
N₀ = N₁
Let the string w = ‘000111’ Here, N₀ = 3 N₁ = 3. So, N₀= N₁and therefore, w ∈ L and this string satisfies one of the properties of a context-free language.
Parikh’s theorem is all about counting the occurrence of each and every input symbol in the given string and here we calculate no. of 0’s and no. of 1’s.
The Parikh vector can be defined for this particular string with the no. of 0(|w|₀) and 1(|w|₁) in the string.
P(w) = {|w|₀, |w|₁}

Parikh vector :
P(‘000111’) = {3, 3}

Here is another example. Let, L = {0ⁿ1ⁿ | n<3 , n>=1}. The possible values of n are 1 and 2.
For n = 1, string is ’01’ and for n = 2 , a string is ‘0011’.

The Parikh vector can be defined for this particular string with the no. of 0(|w|₀) and 1(|w|₁) in the string.
P(w) = {|w|₀ , |w|₁}
So, P(’01’) = {1, 1}
and similarly, P(‘0011’) = {2, 2}
P(L) is the set of Parikh vectors of words in L .
Then, here P(L) = { P(’01’) , P(‘0011’) } = { {1,1}, {2,2} }
As we get the set of Parikh vectors of the given language , so the string belongs to the given language is finite and we can easily construct the DFA for the given language, i.e. a commutatively equivalent to some regular language.

Some corollaries :

Every CFL is “letter-equivalent” to a regular language.
For example: ψ({aⁿbⁿ | n ≥ 0}) = ψ((ab)^*).
Lengths of a CFL forms an ultimately periodic set.
CFL’s over a single-letter alphabet are regular.
It is useful for deciding that strings with a given number of terminals are not accepted by context-free grammar.

Significance :
The theorem has multiple interpretations. It shows that a context-free language over a singleton alphabet must be a regular language and that some context-free languages can only have ambiguous grammar. Such languages are called inherently ambiguous languages. From a formal grammar perspective, this means that some ambiguous context-free grammars cannot be converted to equivalent unambiguous context-free grammars.

Article Tags :

GATE CS

Theory of Computation