BNF Notation in Compiler Design

BNF stands for Backus Naur Form notation. It is a formal method for describing the syntax of programing language which is understood as Backus Naur Formas introduced by John Bakus and Peter Naur in 1960. BNF and CFG (Context Free Grammar) were nearly identical. BNF may be a meta-language (a language that cannot describe another language) for primary languages.

For human consumption, a proper notation for encoding grammars intended and called Backus Naur Form (BNF). Different languages have different description and rules but the general structure of BNF is given below –

name ::= expansion

The symbol ::= means “may expand into” and “may get replaced with.” In some texts, a reputation is additionally called a non-terminal symbol.

  • Every name in Backus-Naur form is surrounded by angle brackets, < >, whether it appears on the left- or right-hand side of the rule.
  • An expansion is an expression containing terminal symbols and non-terminal symbols, joined together by sequencing and selection.
  • A terminal symbol may be a literal like (“+” or “function”) or a category of literals (like integer).
  • Simply juxtaposing expressions indicates sequencing.
  • A vertical bar | indicates choice.



Examples :

<expr> ::= <term> "+" <expr>
        |  <term>

<term> ::= <factor> "*" <term>
        |  <factor>

<factor> ::= "(" <expr> ")"
          |  <const>

<const> ::= integer



Rules For making BNF :
Naturally, we will define a grammar for rules in BNF –



rule → name ::= expansion
name → < identifier >
expansion → expansion expansion
expansion → expansion | expansion
expansion → name
expansion → terminal
  • We might define identifiers as using the regular expression [-A-Za-z_0-9]+.
  • A terminal could be a quoted literal (like “+”, “switch” or ” “<<=”) or the name of a category of literals (like integer).
  • The name of a category of literals is typically defined by other means, like a daily expression or maybe prose.



It is common to seek out regular-expression-like operations inside grammars. as an example, the Python lexical specification uses them. In these grammars:

postfix * means "repeated 0 or more times"
postfix + means "repeated 1 or more times"
postfix ? means "0 or 1 times"



The definition of floating-point literals in Python may be an exemplar of mixing several notations –

floatnumber   ::=  pointfloat | exponentfloat
pointfloat    ::=  [intpart] fraction | intpart "."
exponentfloat ::=  (intpart | pointfloat) exponent
intpart       ::=  digit+
fraction      ::=  "." digit+
exponent      ::=  ("e" | "E") ["+" | "-"] digit+

It does not use angle brackets around names (like many EBNF notations and ABNF), yet does use ::= (like BNF). It mixes regular operations like + for non-empty repetition with EBNF conventions like [ ] for option. The grammar for the whole Python language uses a rather different (but still regular) notation.

Attention reader! Don’t stop learning now. Get hold of all the important CS Theory concepts for SDE interviews with the CS Theory Course at a student-friendly price and become industry ready.

My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.