Last Minute Notes – Compiler Design

See Last Minute Notes on all subjects here.

Phases of Compiler:

Symbol Table : It is a data structure being used and maintained by the compiler, consists all the identifier’s name along with their types. It helps the compiler to function smoothly by finding the identifiers quickly.

  1. Lexical Analysis : Lexical analyzer reads a source program character by character to produce tokens. Tokens can be identifiers, keywords, operators, separators etc.
  2. Syntax Analysis : Syntax analyzer is also known as parser. It constructs the parse tree. It takes all the tokens one by one and uses Context Free Grammar to construct the parse tree.
  3. Semantic Analyzer : It verifies the parse tree, whether it’s meaningful or not. It furthermore produces a verified parse tree.
  4. Intermediate Code Generator : It generates intermediate code, that is a form which can be readily executed by machine We have many popular intermediate codes.
  5. Code Optimizer : It transforms the code so that it consumes fewer resources and produces more speed.
  6. Target Code Generator : The main purpose of Target Code generator is to write a code that the machine can understand. The output is dependent on the type of assembler.

Error handling :
The tasks of the Error Handling process are to detect each error, report it to the user, and then make some recover strategy and implement them to handle error. An Error is the blank entries in the symbol table. There are two types of error :
Run-Time Error : A run-time error is an error which takes place during the execution of a program, and usually happens because of adverse system parameters or invalid input data.
Compile-Time Error: Compile-time errors rises at compile time, before execution of the program.

  1. Lexical :This includes misspellings of identifiers, keywords or operators.
  2. Syntactical :missing semicolon or unbalanced parenthesis.
  3. Semantical :incompatible value assignment or type mismatches between operator and operand.
  4. Logical :code not reachable, infinite loop.

Left Recursion : The grammar : A -> Aa | a is left recursive. Top down parsing techniques cannot handle left recursive grammar so we convert left recursion into right recursion.
Left recursion elimination : A -> Aa | a ⇒ A -> aA’
A’ -> aA’ | a

Left Factoring : If a grammar has common prefixes in r.h.s of nonterminal then suh grammar needs to be left factored by eliminating common prefixes as follows :
A -> ab1 | ac2 ⇒ A -> A -> aA’
A’ -> A -> b1 | c2

FIRST(A) is a set of the terminal symbols which occur as first symbols in string derived from A

FOLLOW(A) is the set of terminals which occur immediately after the nonterminal A in the strings derived from the starting symbol.

Screenshot from 2017-02-09 11-17-55

LL(1) Parser : LL(1) grammar is unambiguous, left factored and non left recursive.
To check whether a grammar is LL(1) or not :
1. If A -> B1 | C2 ⇒ { FIRST(B1) ∩ FIRST(C2 ) = φ }
2. If A -> B | ∈ ⇒ { FIRST(B) ∩ FOLLOW(A) = φ }


LR(0) Parser : Closure() and goto() functions are used to create canonical collection of LR items.
Conflicts in LR(0) parser :
1. Shift Reduce (SR) conflict : when the same state in DFA contains both shift and reduce items. A -> B . xC (shifting) B -> a. (reduced)
2. Reduced Reduced (RR) conflict : two reductions in same state of DFA A -> a. (reduced) B -> b. (reduced)

SLR Parser : It is powerful than LR(0).
Ever LR(0) is SLR but every SLR need not be LR(0).
Conflicts in SLR
1. SR conflict : A -> B . xC (shifting) B -> a. (reduced) if FOLLOW(B) ∩ {x} ≠ φ
2. RR conflict : A -> a. (reduced) B -> b. (reduced) if FOLLOW(A) ∩ FOLLOW(B) ≠ φ

CLR Parser : It is same as SLR parser except that the reduced entries in CLR parsing table go only in the FOLLOW of the l.h.s nonterminal.

LALR Parser : It is constructed from CLR parser, if two states having same productions but may contain different lookaheads, those two states of CLR are combined into single state in LALR.
Every LALR grammar is CLR but every CLR grammar need not be LALR.

Parsers Comparison :

LR(0) ⊂ SLR ⊂ LALR ⊂ CLR
LL(1) ⊂ LALR ⊂ CLR
If number of states LR(0) = n1, number of states SLR = n2, number of states LALR = n3, number of states CLR = n4 then, n1 = n2 = n3 <= n4

Syntax Directed Translation: Syntax Directed Translation are augmented rules to the grammar that facilitate semantic analysis.
Eg – S -> AB {print (*)}
A -> a {print (1)}
B -> b {print (2)}
Synthesized Attribute : attribute whose value is evaluated in terms of attribute values of its children.
Inherited Attribute : attribute whose value is evaluated in terms of attribute values of siblings or parents.

S-attributed SDT: If an SDT uses only synthesized attributes, it is called as S-attributed SDT. S-attributed SDTs are evaluated in bottom-up parsing, as the values of the parent nodes depend upon the values of the child nodes.
L-attributed SDT : If an SDT uses either synthesized attributes or inherited attributes with a restriction that it can inherit values from left siblings only, it is called as L-attributed SDT. Attributes in L-attributed SDTs are evaluated by depth-first and left-to-right parsing manner.

Activation Record : Information needed by a single execution of a procedure is managed using a contiguous block of storage called activation record. An activation record is allocated when a procedure is entered and it is deallocated when that procedure is exited.

Intermediate Code : They are machine independent codes. Syntax trees, postfix notation, 3-address codes can be used to represent intermediate code.

Three address code:
1. Quadruples (4 fields : operator, operand1, operand2, result)
2. Triplets (3 fields : operator, operand1, operand2)
3. Indirect triples

Code Optimization :

Types of machine independent optimizations –
1. Loop optimizations :

  • Code motion : reduce the evaluation frequency of expression.
  • Loop unrolling : to execute less number of iterations
  • Loop jamming : combine body of two loops whenever they are sharing same index.

2. Constant folding : replacing the value of constants during compilation
3. Constant propagation : replacing the value of an expression during compile time.
4. Strength reduction : replacing costly operators by simple operators.

My Personal Notes arrow_drop_up