Skip to content
Related Articles

Related Articles

PLY (Python lex-Yacc) – An Introduction
  • Last Updated : 26 May, 2020

We all have heard of lex which is a tool that generates lexical analyser which is then used to tokenify input streams and yacc which is a parser generator but there is a python implementation of these two tools in forms of separate modules in a package called PLY.

These modules are named lex.py and yacc.py and works similiar to the original UNIX tools lex and yacc.

PLY works differently from its UNIX counterparts in a way that it doesn’t require a special input file instead it takes python program as inputs directly. The traditional tools also make use of parsing tables which are hard on compiler time whereas PLY caches the results generated and saves them for use and regenerate them as an when needed.

PLY components

lex.py

This is one of the key module in this package because the working of yacc.py also depends on lex.py as it is responsible for generating a collection of tokens from the input text and that collection is then identified using the regular expression rules.



To import this module in your python code use import ply.lex as lex

Example:
Suppose you wrote a simple expression: y = a + 2 * b

When this is passed through ply.py, the following tokens are generated

'y', 'a', '+', '2', '*', '3'

These generated tokens are usually used with token names which are always required.

#Token list of above tokens will be
tokens = ('NUMBER', 'PLUS', 'MINUS', 'TIMES', )

#Regular expression rules for the above example 
 t_PLUS    = r'\+'
 t_MINUS   = r'-'
 t_TIMES   = r'\*'
 t_DIVIDE  = r'/'

More specifically, these can be represented as tuples of token type and token

('ID', 'y'), ('EQUALS', '='), ('ID', 'a'), ('PLUS', '+'), 
('NUMBER', '2'), ('TIMES', '*'), ('NUMBER', '3')

This module provides an external interface too in the form of token() which returns the valid tokens from input.

yacc.py

Another module of this package is yacc.py where yacc stands for Yet Another Compiler Complier. This can be used to implement one-pass compilers.It provides a lot of features which are already available in UNIX yacc and some extra features which gives yacc.py some advantages over traditional yacc

You can use the following to import yacc into your python code import ply.yacc as yacc.



These features include:

  1. LALR(1) parsing
  2. Grammer Validation
  3. Support for empty productions
  4. Extensive error checking capability
  5. Ambiguity Resolution

The explicit token generation token() is also used by yacc.py which continously calls this on user demand to collect tokens and grammer rules. yacc.py spits out Abstract Syntax Tree (AST) as output.

Advantage over UNIX yacc:
Python implementation yacc.py doesn’t involve code-generation process instead it uses reflection to make its lexers and parsers which saves space as it doesn’t require any extra complier constructions step and code file generation.

For importing the tokens from your lex file use from lex_file_name_here import tokens where tokens is the list of tokens specified in the lex file.

To specify the grammar rules we have to define functions in our yacc file. The syntax for the same is as follows:

def function_name_here(symbol):
    expression = expression token_name term

 
References:
https://www.dabeaz.com/ply/ply.html

Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.

My Personal Notes arrow_drop_up
Recommended Articles
Page :