Error Handling in YACC in Compiler Design

Last Updated : 28 Jan, 2023

A compiler is a computer program that translates code written in one programming language (the source language) into another programming language (the target language). The process of compiling involves several stages, including lexical analysis, syntax analysis, semantic analysis, code generation, and code optimization.

The first stage of the compilation process is lexical analysis, also known as scanning. The lexical analyzer reads the source code, character by character, and groups the characters into tokens, which are the basic elements of the source language. Tokens are typically keywords, operators, and identifiers.
The next stage is syntax analysis, also known as parsing. The syntax analyzer uses the tokens generated by the lexical analyzer to construct the parse tree, which represents the syntactic structure of the source code. The parse tree is then used to check if the source code is grammatically correct.
The semantic analysis stage checks if the source code adheres to the semantic rules of the language. This includes checking if variable names are declared before they are used and if the data types of variables and expressions match.
The code generation stage takes the parse tree and generates target code, which can be either in the form of an executable program or an intermediate code to be used by other compiler stages. This stage also includes code optimization, which aims to improve the performance of the generated code.
Compiler design is an active area of research and development, with new techniques and tools being developed to improve the efficiency, accuracy, and flexibility of compilers. The goal of compiler design is to create a compiler that can effectively and efficiently translate source code into target code, while also detecting and reporting errors in the source code.

There are different approaches to compiler design, such as the traditional one-pass and multi-pass compilers, and the modern LL and LR parsers. Compiler design is a complex process that requires a deep understanding of programming languages, computer architecture, and software engineering. YACC is often used in conjunction with a lexical analyzer tool such as lex, which is used to tokenize the input source code into a stream of tokens. The lexical analyzer and the YACC-generated parser work together to turn the input source code into a more easily processed format.

YACC

YACC (Yet Another Compiler Compiler) is a tool for generating a parser for a specified grammar. It was developed by Stephen C. Johnson at AT&T Bell Laboratories in the 1970s. A parser is a program that takes input in the form of a sequence of tokens and checks if it conforms to a specified set of rules, called grammar. If the input is valid, the parser generates a parse tree, which represents the structure of the input according to the grammar.

YACC works by taking a file containing grammar in a specified format and generating C or C++ code for a parser that implements the grammar. The user also provides code for actions to be taken when certain grammar rules are recognized, such as creating parse tree nodes or generating code.

The file is typically named with the “.y” file extension.

YACC uses LALR (Look-Ahead Left-to-Right) parsing, which is a type of bottom-up parsing that uses a stack to keep track of the input and a set of rules to determine what to do next.

YACC grammars consist of a set of rules, each of which has a left-hand side (LHS) and a right-hand side (RHS). The LHS is a nonterminal symbol, which represents a category of input, and the RHS is a sequence of the terminal and nonterminal symbols, which represents a possible sequence of input. For example, a grammar for simple arithmetic expressions might have a rule for an expression, with the LHS being “expression” and the RHS being “term + expression” or “term – the expression” or “term”, where “term” is another nonterminal symbol.

YACC also provides a number of built-in features, such as error recovery and conflict resolution, that allow for more robust and flexible parsing.

The YACC input file, also known as the YACC specification file, contains the following main components:

Declarations: This section includes any global variables or constants that are used in the program.
Terminal symbols: This section defines the terminal symbols, or the tokens, that the parser will recognize. These symbols are typically defined using the %token directive.
Non-terminal symbols: This section defines the non-terminal symbols, or the grammar rules, that the parser will use to parse the input. These symbols are typically defined using the %nonterm directive.
Grammar rules: This section defines the grammar rules for the parser. Each rule starts with a non-terminal symbol, followed by a colon and a list of terminal and non-terminal symbols that make up the rule. The grammar rules are typically separated by a vertical bar (|).
Start symbol: This section defines the start symbol for the parser. The start symbol is the non-terminal symbol that the parser will begin parsing with.
Code section: This section includes any C code that is used in the parser. This code is typically used to implement actions that are taken when a specific grammar rule is matched.
The YACC input file also includes additional directives and options that can be used to customize the behavior of the parser. These include %start, %left, %right, %prec, and %type, among others.

Here is an example of a simple calculator program written in C using the YACC (Yet Another Compiler Compiler) tool:

This program defines a simple calculator that can evaluate expressions containing numbers and the operators +, -, *, and /. The YACC code defines the grammar for the calculator, and the code in the curly braces specifies the actions to be taken when a particular grammar rule is matched. The yylex() function is used to read in and return the next token, and the yyerror() function is called when an error is encountered in the input.

Error Handling

Error handling refers to the process of identifying, diagnosing, and resolving errors or bugs in a computer program or system. This process is important to ensure that the program or system runs smoothly and without interruption. Error handling can be done through the use of error messages, debugging tools, and other techniques. These methods can help developers identify the cause of the error and take appropriate action to fix it. Additionally, error handling can also help to prevent errors from occurring in the first place, by implementing error-checking code or other safeguards. In a parser generated by YACC, error handling is achieved through the use of error recovery rules.

When the parser encounters a token that is not part of the grammar, it enters an error-recovery mode. In this mode, the parser tries to find a way to continue parsing by looking for a sequence of tokens that can be used to resume parsing. This is typically done by skipping one or more tokens and then trying to match the input to a different rule in the grammar.

YACC provides two types of error recovery: Panic-mode error recovery and Phrase-level error recovery. Panic-mode error recovery is used when the parser encounters a token that is not part of the grammar, and it skips input until it finds a token that can be used to resume parsing. Phrase-level error recovery is used when the parser encounters a token that is not part of the current rule, and it tries to find a way to continue parsing by matching the input to a different rule.

Additionally, YACC also provides a mechanism for resolving conflicts, such as shift-reduce and reduce-reduce conflicts, that can occur during parsing. This allows the parser to choose the correct action to take when multiple rules are applicable to the input.

Error handling in a YACC program can be done using the following techniques:

Error tokens: YACC allows you to define error tokens that can be used to handle errors in grammar. When an error is encountered, the error token is used to skip over the input and continue parsing.
Error rules: You can also define error rules in your YACC program. These rules are used to handle specific errors in grammar. For example, if a specific input is expected but not found, the error rule can be used to handle that situation.
Error recovery: YACC also provides an error recovery mechanism. This mechanism allows the parser to recover from errors by skipping over input and continuing parsing. This can be useful for handling unexpected input.
Error messages: YACC allows you to specify error messages that will be displayed when an error is encountered. This can help the user understand what went wrong and how to fix the problem.
yyerror() function: You can also use the yyerror() function to handle errors in your YACC program. This function is called when an error is encountered, and it can be used to display an error message or perform other error-handling tasks.

Overall, error handling in YACC requires a combination of all these methods to be successful.

Terminologies

yylex(): yylex is a lexical analyzer generator, used to generate lexical analyzers (or scanners) for programming languages and other formal languages. It reads a specification file, which defines the patterns and rules for recognizing the tokens of a language, and generates a C or C++ program that can be used to scan and tokenize input text according to those rules. This generated program is typically used as a component of a larger compiler or interpreter for the language.
yyerorr(): yyerror is a function in the YACC (Yet Another Compiler Compiler) library that is called when a parsing error is encountered. It is typically used to print an error message and/or take other appropriate action. The function is defined by the user and can be customized to suit their needs.
Parser: A parser is a software program or algorithm that processes and analyzes text or data, breaking it down into smaller components and identifying patterns and structures. It is often used in natural language processing, computer programming, and data analysis to extract meaning and information from large sets of text or data. Parsers can be used to identify syntax, grammar, and semantic elements in a text or data, and can also be used to generate code or other output based on the information they extract.
yyparse(): yyparse is a function used in the YACC (Yet Another Compiler-Compiler) tool to parse a given input according to the grammar defined in the YACC file. It reads the input and generates a parse tree, which is used to generate the target code or perform other actions as defined in the YACC file.