Prerequisite – Simplifying Context Free Grammars

A context free grammar (CFG) is in Chomsky Normal Form (CNF) if all production rules satisfy one of the following conditions:

- A non-terminal generating a terminal (e.g.; X->x)
- A non-terminal generating two non-terminals (e.g.; X->YZ)
- Start symbol generating ε. (e.g.; S-> ε)

Consider the following grammars,

G1 = {S->a, S->AZ, A->a, Z->z} G2 = {S->a, S->aZ, Z->a}

The grammar G1 is in CNF as production rules satisfy the rules specified for CNF. However, the grammar G2 is not in CNF as the production rule S->aZ contains terminal followed by non-terminal which does not satisfy the rules specified for CNF.

**Note –**

- For a given grammar, there can be more than one CNF.
- CNF produces the same language as generated by CFG.
- CNF is used as a preprocessing step for many algorithms for CFG like CYK(membership algo), bottom-up parsers etc.
- For generating string w of length ‘n’ requires ‘2n-1’ production or steps in CNF.
- Any Context free Grammar that do not have ε in it’s language has an equivalent CNF.

**How to convert CFG to CNF?**

**Step 1.** Eliminate start symbol from RHS.

If start symbol S is at the RHS of any production in the grammar, create a new production as:

S0->S

where S0 is the new start symbol.

**Step 2.** Eliminate null, unit and useless productions.

If CFG contains null, unit or useless production rules, eliminate them. You can refer the this article to eliminate these types of production rules.

**Step 3.** Eliminate terminals from RHS if they exist with other terminals or non-terminals. e.g,; production rule X->xY can be decomposed as:

X->ZY

Z->x

**Step 4.** Eliminate RHS with more than two non-terminals.

e.g,; production rule X->XYZ can be decomposed as:

X->PZ

P->XY

**Example –** Let us take an example to convert CFG to CNF. Consider the given grammar G1:

S → ASB A → aAS|a|ε B → SbS|A|bb

**Step 1.** As start symbol S appears on the RHS, we will create a new production rule S0->S. Therefore, the grammar will become:

S0->S S → ASB A → aAS|a|ε B → SbS|A|bb

**Step 2.** As grammar contains null production A-> ε, its removal from the grammar yields:

S0->S S → ASB|SB A → aAS|aS|a B → SbS| A|ε|bb

Now, it creates null production B→ ε, its removal from the grammar yields:

S0->S S → AS|ASB| SB| S A → aAS|aS|a B → SbS| A|bb

Now, it creates unit production B->A, its removal from the grammar yields:

S0->S S → AS|ASB| SB| S A → aAS|aS|a B → SbS|bb|aAS|aS|a

Also, removal of unit production S0->S from grammar yields:

S0-> AS|ASB| SB| S S → AS|ASB| SB| S A → aAS|aS|a B → SbS|bb|aAS|aS|a

Also, removal of unit production S->S and S0->S from grammar yields:

S0-> AS|ASB| SB S → AS|ASB| SB A → aAS|aS|a B → SbS|bb|aAS|aS|a

**Step 3.** In production rule A->aAS |aS and B-> SbS|aAS|aS, terminals a and b exist on RHS with non-terminates. Removing them from RHS:

S0-> AS|ASB| SB S → AS|ASB| SB A → XAS|XS|a B → SYS|bb|XAS|XS|a X →a Y→b

Also, B->bb can’t be part of CNF, removing it from grammar yields:

S0-> AS|ASB| SB S → AS|ASB| SB A → XAS|XS|a B → SYS|VV|XAS|XS|a X → a Y → b V → b

**Step 4:** In production rule S0->ASB, RHS has more than two symbols, removing it from grammar yields:

S0-> AS|PB| SB S → AS|ASB| SB A → XAS|XS|a B → SYS|VV|XAS|XS|a X → a Y → b V → b P → AS

Similarly, S->ASB has more than two symbols, removing it from grammar yields:

S0-> AS|PB| SB S → AS|QB| SB A → XAS|XS|a B → SYS|VV|XAS|XS|a X → a Y → b V → b P → AS Q → AS

Similarly, A->XAS has more than two symbols, removing it from grammar yields:

S0-> AS|PB| SB S → AS|QB| SB A → RS|XS|a B → SYS|VV|XAS|XS|a X → a Y → b V → b P → AS Q → AS R → XA

Similarly, B->SYS has more than two symbols, removing it from grammar yields:

S0 -> AS|PB| SB S → AS|QB| SB A → RS|XS|a B → TS|VV|XAS|XS|a X → a Y → b V → b P → AS Q → AS R → XA T → SY

Similarly, B->XAX has more than two symbols, removing it from grammar yields:

S0-> AS|PB| SB S → AS|QB| SB A → RS|XS|a B → TS|VV|US|XS|a X → a Y → b V → b P → AS Q → AS R → XA T → SY U → XA

So this is the required CNF for given grammar.