We all have heard of
lex which is a tool that generates lexical analyser which is then used to tokenify input streams and
yacc which is a parser generator but there is a python implementation of these two tools in forms of separate modules in a package called PLY.
These modules are named
yacc.py and works similiar to the original UNIX tools
PLY works differently from its UNIX counterparts in a way that it doesn’t require a special input file instead it takes python program as inputs directly. The traditional tools also make use of parsing tables which are hard on compiler time whereas PLY caches the results generated and saves them for use and regenerate them as an when needed.
This is one of the key module in this package because the working of
yacc.py also depends on
lex.py as it is responsible for generating a collection of tokens from the input text and that collection is then identified using the regular expression rules.
To import this module in your python code use
import ply.lex as lex
Suppose you wrote a simple expression: y = a + 2 * b
When this is passed through
ply.py, the following tokens are generated
'y', 'a', '+', '2', '*', '3'
These generated tokens are usually used with token names which are always required.
#Token list of above tokens will be tokens = ('NUMBER', 'PLUS', 'MINUS', 'TIMES', ) #Regular expression rules for the above example t_PLUS = r'\+' t_MINUS = r'-' t_TIMES = r'\*' t_DIVIDE = r'/'
More specifically, these can be represented as tuples of token type and token
('ID', 'y'), ('EQUALS', '='), ('ID', 'a'), ('PLUS', '+'), ('NUMBER', '2'), ('TIMES', '*'), ('NUMBER', '3')
This module provides an external interface too in the form of
token() which returns the valid tokens from input.
Another module of this package is
yacc stands for Yet Another Compiler Complier. This can be used to implement one-pass compilers.It provides a lot of features which are already available in UNIX
yacc and some extra features which gives
yacc.py some advantages over traditional
You can use the following to import
yacc into your python code
import ply.yacc as yacc.
These features include:
- LALR(1) parsing
- Grammer Validation
- Support for empty productions
- Extensive error checking capability
- Ambiguity Resolution
The explicit token generation
token() is also used by
yacc.py which continously calls this on user demand to collect tokens and grammer rules.
yacc.py spits out Abstract Syntax Tree (AST) as output.
Advantage over UNIX yacc:
yacc.py doesn’t involve code-generation process instead it uses reflection to make its lexers and parsers which saves space as it doesn’t require any extra complier constructions step and code file generation.
For importing the tokens from your lex file use
from lex_file_name_here import tokens where
tokens is the list of tokens specified in the lex file.
To specify the grammar rules we have to define functions in our
yacc file. The syntax for the same is as follows:
def function_name_here(symbol): expression = expression token_name term
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.