Various Data Structures Used in Compiler

Last Updated : 11 Jul, 2023

A compiler is a program that converts HLL(High-Level Language) to LLL(Low-Level Language) like machine-level language. The compiler has various data structures that the compiler uses to perform its operations. These data structures are needed by the phases of the compiler. Now we are going to discuss the various data structures in the compiler.

There are various data structures used in compilers such as:-

Tokens
Syntax Tree
Symbol Table
Literal Table
Parse Tree

1. Tokens

Typically when a scanner scans the input and gathers a stream of characters into tokens, it represents the token symbolically it is represented as an enumerated data type representing the set of tokens in the source language. It is important to keep the characters string and the information derived from it.

2. Syntax Tree

A syntax tree is a tree data structure in which a node represents an operand and each interior node represents an operator. It is a dynamically allocated pointer-based tree data structure that is created as parsing proceeds. If the syntax tree is generated by the parser, then it is in the tree form.

For ex- Syntax tree for a+b*c.

3. Symbol Table

The symbol table is a data structure that is used to keep the information of identifiers, functions, variables, constants, and data types. It is created and maintained by the compiler because it keeps the information about the occurrence of entities. The symbol table is used in almost every phase of the compiler, we can see that in the below diagram of phases of a compiler. The scanner, parser, and semantic phase may enter identifiers into the symbol table and the optimization and code generation phase will access the symbol table to use the information provided by the symbol table to make appropriate decisions. Given the frequency of access to the symbol table, the insertion, deletion, and access operations should be well-optimized and efficient. The hash table is mainly used here.

4. Literal Table

A literal table is a data structure that is used to keep track of literal variables in the program. It holds constant and strings used in the program but it can appear only once in a literal table and its contents apply to the whole program, which is why deletions are not necessary for it. The literal table allows the reuse of constants and strings that plays an important role in reducing the program size.

5. Parse Tree

A parse tree is the hierarchical representation of symbols. The symbols include terminal or non-terminal. In the parse tree the string is derived from the starting symbol and the starting symbol is mainly the root of the parse tree. All the leaf nodes are symbols and the inner nodes are the operators or non-terminals. To get the output we can use Inorder Traversal.

For example:- Parse tree for a+b*c.

And there is intermediate code which also needs data structures to store the data.

6. Intermediate Code

Once the intermediate code is generated, the intermediate code can be stored as a linked list of structures, a text file, or an array of strings that only depends on the type of intermediate code that is generated. According to that, we choose the right data structures that will carry optimization.

Frequently Asked Questions

Q.1 What is the use of a symbol table?

Answer:

The symbol table is a data structure that is used to keep the information of identifiers, functions, variables, constants, and data types. It is created and maintained by the compiler because it keeps the information about the occurrence of entities.

Q.2 How are the terminals and non-terminals represented in the parse tree?

Answer:

A parse tree is the hierarchical representation of symbols. All the leaf nodes are symbols and the inner nodes are the operators or non-terminals.

Suggest improvement

What is Data Structure?

Share your thoughts in the comments