Open In App

Removing Direct and Indirect Left Recursion in a Grammar

Improve
Improve
Like Article
Like
Save
Share
Report

Prerequisite – Classification of Context Free Grammars, Ambiguity and Parsers 

Introduction: Left recursion is a common problem that occurs in grammar during parsing in the syntax analysis part of compilation. It is important to remove left recursion from grammar because it can create an infinite loop, leading to errors and a significant decrease in performance. This article will provide an algorithm to remove left recursion from grammar, along with an example and explanations of the process.

Left Recursion: Grammar of the form,

S ⇒ S | a | b 

is called left recursive where S is any non Terminal and a and b are any set of terminals. 
Problem with Left Recursion: If a left recursion is present in any grammar then, during parsing in the syntax analysis part of compilation, there is a chance that the grammar will create an infinite loop. This is because, at every time of production of grammar, S will produce another S without checking any condition.
Algorithm to Remove Left Recursion with an example: Suppose we have a grammar which contains left recursion:

S ⇒ S a | S b | c | d 

Check if the given grammar contains left recursion. If present, then separate the production and start working on it.  In our example:

S ⇒ S a | S b | c | d   

Introduce a new nonterminal and write it at the end of every terminal. We create a new nonterminal S’ and write the new production as:

S ⇒ cS' | dS' 

Write the newly produced nonterminal S’ in the LHS, and in the RHS it can either produce S’ or it can produce new production in which the terminals or non terminals which followed the previous LHS will be replaced by the new nonterminal S’ at the end of the term.

S' ⇒ ε | aS' | bS'

So, after conversion, the new equivalent production is:

S ⇒ cS' | dS'
S' ⇒ ε | aS' | bS'

Indirect Left Recursion: A grammar is said to have indirect left recursion if, starting from any symbol of the grammar, it is possible to derive a string whose head is that symbol. For example,

A ⇒ B r 
B ⇒ C d
C ⇒ A t 

where A, B, C are non-terminals and r, d, t are terminals. Here, starting with A, we can derive A again by substituting C to B and B to A.

Algorithm to remove Indirect Recursion with help of an example:

A1 ⇒ A2 A3
A2 ⇒ A3 A1 | b
A3 ⇒ A1 A1 | a 

Where A1, A2, A3 are non terminals and a, b are terminals.

Identify the productions which can cause indirect left recursion. In our case,

A3 ⇒ A1 A1 | a

Substitute its production at the place the terminal is present in any other production: substitute A1–> A2 A3 in production of A3. 

A3 ⇒ A2 A3 A1 | a

Now in this production substitute A2 ⇒ A3 A1 | b 

A3 ⇒ (A3 A1 | b) A3 A1 | a 

and then distributing,

A3 ⇒ A3 A1 A3 A1 | b A3 A1 | a

Now the new production is converted in the form of direct left recursion, solve this by the direct left recursion method. 

Eliminating direct left recursion as in the above, introduce a new nonterminal and write it at the end of every terminal. We create a new nonterminal A’ and write the new productions as:

A3 ⇒ b A3 A1 A' | aA'
A' ⇒ ε | A1 A3 A1 A'

ε can be distributed to avoid an empty term:

A3 ⇒ b A3 A1 | a | b A3 A1 A' | aA'
A' ⇒ A1 A3 A1 | A1 A3 A1 A'

The resulting grammar is then:

A1 ⇒ A2 A3
A2 ⇒ A3 A1 | b
A3 ⇒ b A3 A1 | a | b A3 A1 A' | aA'
A' ⇒ A1 A3 A1 | A1 A3 A1 A'

Implementation:

C++




#include <bits/stdc++.h>
using namespace std;
 
class NonTerminal {
    string name;                    // Stores the Head of production rule
    vector<string> productionRules; // Stores the body of production rules
 
public:
    NonTerminal(string name) {
        this->name = name;
    }
 
    // Returns the head of the production rule
    string getName() {
        return name;
    }
 
    // Returns the body of the production rules
    void setRules(vector<string> rules) {
        productionRules.clear();
        for (auto rule : rules){
            productionRules.push_back(rule);
        }
    }
 
    vector<string> getRules() {
        return productionRules;
    }
 
    void addRule(string rule) {
        productionRules.push_back(rule);
    }
 
    // Prints the production rules
    void printRule() {
        string toPrint = "";
        toPrint += name + " ->";
 
        for (string s : productionRules){
            toPrint += " " + s + " |";
        }
 
        toPrint.pop_back();
        cout << toPrint << endl;
    }
};
 
class Grammar {
    vector<NonTerminal> nonTerminals;
 
public:
    // Add rules to the grammar
    void addRule(string rule) {
        bool nt = 0;
        string parse = "";
 
        for (char c : rule){
            if (c == ' ') {
                if (!nt) {
                    NonTerminal newNonTerminal(parse);
                    nonTerminals.push_back(newNonTerminal);
                    nt = 1;
                    parse = "";
                } else if (parse.size()){
                    nonTerminals.back().addRule(parse);
                    parse = "";
                }
            }else if (c != '|' && c != '-' && c != '>'){
                parse += c;
            }
        }
        if (parse.size()){
            nonTerminals.back().addRule(parse);
        }
    }
 
    void inputData() {
 
        
        addRule("S -> Sa | Sb | c | d");
 
    }
 
    // Algorithm for eliminating the non-Immediate Left Recursion
    void solveNonImmediateLR(NonTerminal &A, NonTerminal &B) {
        string nameA = A.getName();
        string nameB = B.getName();
 
        vector<string> rulesA, rulesB, newRulesA;
        rulesA = A.getRules();
        rulesB = B.getRules();
 
        for (auto rule : rulesA) {
            if (rule.substr(0, nameB.size()) == nameB) {
                for (auto rule1 : rulesB){
                    newRulesA.push_back(rule1 + rule.substr(nameB.size()));
                }
            }
            else{
                newRulesA.push_back(rule);
            }
        }
        A.setRules(newRulesA);
    }
 
    // Algorithm for eliminating Immediate Left Recursion
    void solveImmediateLR(NonTerminal &A) {
        string name = A.getName();
        string newName = name + "'";
 
        vector<string> alphas, betas, rules, newRulesA, newRulesA1;
        rules = A.getRules();
 
        // Checks if there is left recursion or not
        for (auto rule : rules) {
            if (rule.substr(0, name.size()) == name){
                alphas.push_back(rule.substr(name.size()));
            }
            else{
                betas.push_back(rule);
            }
        }
 
        // If no left recursion, exit
        if (!alphas.size())
            return;
 
        if (!betas.size())
            newRulesA.push_back(newName);
 
        for (auto beta : betas)
            newRulesA.push_back(beta + newName);
 
        for (auto alpha : alphas)
            newRulesA1.push_back(alpha + newName);
 
        // Amends the original rule
        A.setRules(newRulesA);
        newRulesA1.push_back("\u03B5");
 
        // Adds new production rule
        NonTerminal newNonTerminal(newName);
        newNonTerminal.setRules(newRulesA1);
        nonTerminals.push_back(newNonTerminal);
    }
 
    // Eliminates left recursion
    void applyAlgorithm() {
        int size = nonTerminals.size();
        for (int i = 0; i < size; i++){
            for (int j = 0; j < i; j++){
                solveNonImmediateLR(nonTerminals[i], nonTerminals[j]);
            }
            solveImmediateLR(nonTerminals[i]);
        }
    }
 
    // Print all the rules of grammar
    void printRules() {
        for (auto nonTerminal : nonTerminals){
            nonTerminal.printRule();
        }
    }
};
 
int main(){
    //freopen("output.txt", "w+", stdout);
 
    Grammar grammar;
    grammar.inputData();
    grammar.applyAlgorithm();
    grammar.printRules();
 
    return 0;
}


Java




import java.util.*;
 
class NonTerminal{
    private String name;
    private ArrayList<String> rules;
 
    public NonTerminal(String name) {
        this.name = name;
        rules = new ArrayList<>();
    }
 
    public void addRule(String rule) {
        rules.add(rule);
    }
 
    public void setRules(ArrayList<String> rules) {
        this.rules = rules;
    }
 
    public String getName() {
        return name;
    }
 
    public ArrayList<String> getRules() {
        return rules;
    }
 
    public void printRule() {
        System.out.print(name + " -> ");
        for (int i = 0; i < rules.size(); i++){
            System.out.print(rules.get(i));
            if (i != rules.size() - 1)
                System.out.print(" | ");
        }
        System.out.println();
    }
}
 
 
class Grammar{
    private ArrayList<NonTerminal> nonTerminals;
 
    public Grammar() {
        nonTerminals = new ArrayList<>();
    }
 
    public void addRule(String rule) {
        boolean nt = false;
        String parse = "";
 
        for (int i = 0; i < rule.length(); i++){
            char c = rule.charAt(i);
            if (c == ' ') {
                if (!nt) {
                    NonTerminal newNonTerminal = new NonTerminal(parse);
                    nonTerminals.add(newNonTerminal);
                    nt = true;
                    parse = "";
                } else if (parse.length() != 0){
                    nonTerminals.get(nonTerminals.size() - 1).addRule(parse);
                    parse = "";
                }
            }else if (c != '|' && c != '-' && c != '>'){
                parse += c;
            }
        }
        if (parse.length() != 0){
            nonTerminals.get(nonTerminals.size() - 1).addRule(parse);
        }
    }
 
    public void inputData() {
        addRule("S -> Sa | Sb | c | d");
    }
 
    public void solveNonImmediateLR(NonTerminal A, NonTerminal B) {
        String nameA = A.getName();
        String nameB = B.getName();
 
        ArrayList<String> rulesA = new ArrayList<>();
        ArrayList<String> rulesB = new ArrayList<>();
        ArrayList<String> newRulesA = new ArrayList<>();
        rulesA = A.getRules();
        rulesB = B.getRules();
 
        for (String rule : rulesA) {
            if (rule.substring(0, nameB.length()).equals(nameB)) {
                for (String rule1 : rulesB){
                    newRulesA.add(rule1 + rule.substring(nameB.length()));
                }
            }
            else{
                newRulesA.add(rule);
            }
        }
        A.setRules(newRulesA);
    }
 
    public void solveImmediateLR(NonTerminal A) {
        String name = A.getName();
        String newName = name + "'";
 
        ArrayList<String> alphas= new ArrayList<>();
        ArrayList<String> betas = new ArrayList<>();
        ArrayList<String> rules = A.getRules();
        ArrayList<String> newRulesA = new ArrayList<>();
        ArrayList<String> newRulesA1 = new ArrayList<>();
 
         
        rules = A.getRules();
 
        // Checks if there is left recursion or not
        for (String rule : rules) {
            if (rule.substring(0, name.length()).equals(name)){
                alphas.add(rule.substring(name.length()));
            }
            else{
                betas.add(rule);
            }
        }
 
        // If no left recursion, exit
        if (alphas.size() == 0)
            return;
 
        if (betas.size() == 0)
            newRulesA.add(newName);
 
        for (String beta : betas)
            newRulesA.add(beta + newName);
 
        for (String alpha : alphas)
            newRulesA1.add(alpha + newName);
 
        // Amends the original rule
 
        A.setRules(newRulesA);
        newRulesA1.add("\u03B5");
 
        // Adds new production rule
        NonTerminal newNonTerminal = new NonTerminal(newName);
        newNonTerminal.setRules(newRulesA1);
        nonTerminals.add(newNonTerminal);
    }
 
    public void applyAlgorithm() {
        int size = nonTerminals.size();
        for (int i = 0; i < size; i++){
            for (int j = 0; j < i; j++){
                solveNonImmediateLR(nonTerminals.get(i), nonTerminals.get(j));
            }
            solveImmediateLR(nonTerminals.get(i));
        }
    }
 
    void printRules() {
        for (NonTerminal nonTerminal : nonTerminals){
            nonTerminal.printRule();
        }
    }
     
 
 
}
class Main{
    public static void main(String[] args) {
        Grammar grammar = new Grammar();
        grammar.inputData();
        grammar.applyAlgorithm();
        grammar.printRules();
    }
}


Python3




class NonTerminal :
    def __init__(self, name) :
        self.name = name
        self.rules = []
    def addRule(self, rule) :
        self.rules.append(rule)
    def setRules(self, rules) :
        self.rules = rules
    def getName(self) :
        return self.name
    def getRules(self) :
        return self.rules
    def printRule(self) :
        print(self.name + " -> ", end = "")
        for i in range(len(self.rules)) :
            print(self.rules[i], end = "")
            if i != len(self.rules) - 1 :
                print(" | ", end = "")
        print()
         
         
class Grammar :
    def __init__(self) :
        self.nonTerminals = []
 
    def addRule(self, rule) :
        nt = False
        parse = ""
 
        for i in range(len(rule)) :
            c = rule[i]
            if c == ' ' :
                if not nt :
                    newNonTerminal = NonTerminal(parse)
                    self.nonTerminals.append(newNonTerminal)
                    nt = True
                    parse = ""
                elif parse != "" :
                    self.nonTerminals[len(self.nonTerminals) - 1].addRule(parse)
                    parse = ""
            elif c != '|' and c != '-' and c != '>' :
                parse += c
        if parse != "" :
            self.nonTerminals[len(self.nonTerminals) - 1].addRule(parse)
 
    def inputData(self) :
        self.addRule("S -> Sa | Sb | c | d")
 
    def solveNonImmediateLR(self, A, B) :
        nameA = A.getName()
        nameB = B.getName()
 
        rulesA = []
        rulesB = []
        newRulesA = []
        rulesA = A.getRules()
        rulesB = B.getRules()
 
        for rule in rulesA :
            if rule[0 : len(nameB)] == nameB :
                for rule1 in rulesB :
                    newRulesA.append(rule1 + rule[len(nameB) : ])
            else :
                newRulesA.append(rule)
        A.setRules(newRulesA)
 
    def solveImmediateLR(self, A) :
        name = A.getName()
        newName = name + "'"
 
        alphas = []
        betas = []
        rules = A.getRules()
        newRulesA = []
        newRulesA1 = []
 
        rules = A.getRules()
 
        # Checks if there is left recursion or not
        for rule in rules :
            if rule[0 : len(name)] == name :
                alphas.append(rule[len(name) : ])
            else :
                betas.append(rule)
 
        # If no left recursion, exit
        if len(alphas) == 0 :
            return
 
        if len(betas) == 0 :
            newRulesA.append(newName)
 
        for beta in betas :
            newRulesA.append(beta + newName)
 
        for alpha in alphas :
            newRulesA1.append(alpha + newName)
 
        # Amends the original rule
 
        A.setRules(newRulesA)
        newRulesA1.append("\u03B5")
 
        # Adds new production rule
        newNonTerminal = NonTerminal(newName)
        newNonTerminal.setRules(newRulesA1)
        self.nonTerminals.append(newNonTerminal)
 
    def applyAlgorithm(self) :
        size = len(self.nonTerminals)
        for i in range(size) :
            for j in range(i) :
                self.solveNonImmediateLR(self.nonTerminals[i], self.nonTerminals[j])
            self.solveImmediateLR(self.nonTerminals[i])
 
    def printRules(self) :
        for nonTerminal in self.nonTerminals :
            nonTerminal.printRule()
 
             
grammar = Grammar()
grammar.inputData()
grammar.applyAlgorithm()
grammar.printRules()


Output

S -> cS' | dS' 
S' -> aS' | bS' | ε 

Time Complexity :  The time complexity of the algorithm is O(n*s) where n= no of production rules and s = maximum string length of each rule.



Last Updated : 18 Apr, 2023
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads