Open In App

FIRST Set in Syntax Analysis

Improve
Improve
Like Article
Like
Save
Share
Report

FIRST(X) for a grammar symbol X is the set of terminals that begin the strings derivable from X. 

FIRST set is a concept used in syntax analysis, specifically in the context of LL and LR parsing algorithms. It is a set of terminals that can appear immediately after a given non-terminal in a grammar.

The FIRST set of a non-terminal A is defined as the set of terminals that can appear as the first symbol in any string derived from A. If a non-terminal A can derive the empty string, then the empty string is also included in the FIRST set of A.

The FIRST set is used to determine which production rule should be used to expand a non-terminal in an LL or LR parser. For example, in an LL parser, if the next symbol in the input stream is in the FIRST set of a non-terminal, then that non-terminal can be safely expanded using the production rule that starts with that symbol.

It is worth noting that FIRST set is also used in computing FOLLOW set, which is a set of terminals that can appear immediately after a non-terminal in a grammar. FOLLOW set is used in LR parsing, which requires more information than LL parsing.

To compute the FIRST set of a grammar, one can start with all terminals having the respective terminal in their FIRST set and continue the process by adding the first terminal of the right-hand side of the production to the set of the non-terminal in the left-hand side of the production. Repeat this process until no new element can be added to any set.

FIRST set is a fundamental concept in syntax analysis, and it is used in many parsing algorithms and techniques. Its computation is a

Rules to compute FIRST set: 

  1. If x is a terminal, then FIRST(x) = { ‘x’ }
  2. If x-> ?, is a production rule, then add ? to FIRST(x).
  3. If X->Y1 Y2 Y3….Yn is a production, 
    1. FIRST(X) = FIRST(Y1)
    2. If FIRST(Y1) contains ? then FIRST(X) = { FIRST(Y1) – ? } U { FIRST(Y2) }
    3. If FIRST (Yi) contains ? for all i = 1 to n, then add ? to FIRST(X).

Example 1: 

Production Rules of Grammar
E  -> TE’
E’ -> +T E’|?
T  -> F T’
T’ -> *F T’ | ?
F  -> (E) | id

FIRST sets
FIRST(E) = FIRST(T) = { ( , id }
FIRST(E’) = { +, ? }
FIRST(T) = FIRST(F) = { ( , id }
FIRST(T’) = { *, ? }
FIRST(F) = { ( , id }

Example 2: 

Production Rules of Grammar
S -> ACB | Cbb | Ba
A -> da | BC
B -> g | ?
C -> h | ?

FIRST sets
FIRST(S) = FIRST(ACB) U FIRST(Cbb) U FIRST(Ba)
         = { d, g, h, b, a, ?}
FIRST(A) = { d } U FIRST(BC) 
         = { d, g, h, ? }
FIRST(B) = { g , ? }
FIRST(C) = { h , ? }

Notes: 

  1. The grammar used above is Context-Free Grammar (CFG). Syntax of most programming languages can be specified using CFG.
  2. CFG is of the form A -> B, where A is a single Non-Terminal, and B can be a set of grammar symbols ( i.e. Terminals as well as Non-Terminals)

 Features of FIRST sets:

Definition: The FIRST set of a nonterminal symbol is the set of terminal symbols that can appear as the first symbol in a string derived from that nonterminal. In other words, it is the set of all possible starting symbols for a string derived from that nonterminal.

Calculation: The FIRST set for each nonterminal symbol is calculated by examining the productions for that symbol and determining which terminal symbols can appear as the first symbol in a string derived from that production.

Recursive Descent Parsing: The FIRST set is often used in recursive descent parsing, which is a top-down parsing technique that uses the FIRST set to determine which production to use at each step of the parsing process.

Ambiguity Resolution: The FIRST set can help resolve ambiguities in the grammar by providing a way to determine which production to use based on the next input symbol.

Follow Set: The FOLLOW set is another concept used in syntax analysis that represents the set of symbols that can appear immediately after a nonterminal symbol in a derivation. The FOLLOW set is often used in conjunction with the FIRST set to resolve parsing conflicts and ensure that the parser can correctly identify the structure of the input code.

Advantages and Disadvantages:

Advantages of using FIRST set in syntax analysis include:

  • Improved parsing: FIRST set can be used to determine which production rule should be used to expand a non-terminal in an LL or LR parser, which helps to improve the accuracy and efficiency of the parsing process.
  • Ambiguity resolution: FIRST set can be used to resolve ambiguities in the grammar, by determining which production rule should be used in cases where multiple production rules can apply to the same non-terminal.
  • Simplified error handling: By determining which production rule should be used based on the FIRST set, an LL or LR parser can detect errors in the source code more quickly and accurately.

Disadvantages of using FIRST set in syntax analysis include:

  • Complexity: Computing FIRST set can be a complex process, especially for grammars with many non-terminals and production rules.
  • Limited applicability: FIRST set is mainly used in LL and LR parsing algorithms, and may not be applicable to other types of parsing algorithms.
  • Limitations of LL parsing: LL parsing is limited in its ability to handle certain types of grammars, such as those with left-recursive rules, which can lead to an infinite loop in the parser.

Overall, the use of FIRST set in syntax analysis can improve the accuracy and efficiency of the parsing process, but it should be balanced against the complexity and limitations of the parsing algorithm being used.
In the next article “FOLLOW sets in Compiler Design” we will see how to compute Follow sets. 

This article is compiled by Vaibhav Bajpai.


Last Updated : 10 Apr, 2023
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads