Open In App

Scala Parser Combinators

Improve
Improve
Like Article
Like
Save
Share
Report

When a parser generator is required, some famous parsers that cross our mind are: Yacc and Bison for parsers written in C and ANTLR for parsers written in Java but they are designed to run for specific programming languages. This shortens the scope of use of parsers. However, Scala provides a unique and helpful alternative. Instead of using the standalone domain specific language of a parser generator, one can use an internal domain specific language (internal DSL for short). The internal DSL will consist of a library of parser combinators—functions and operators defined in Scala that will serve as building blocks for parsers.

In order to understand this content, one must have basic knowledge of compilers and must understand regular and context free languages.

The first step is always to write down a grammar for the language to be parsed.
Expression : Every expression (represented by expr) is a term which can be followed by a sequence of ‘+’ or ‘-‘ operators and further terms.
Term : A term is a factor, possibly followed by a sequence of ‘*’ or ‘/’ operators and further factors.
Factor: A factor is either a numeric literal or an expression in parentheses.

Example for arithmetic expression parser:

expr ::= term {"+" term | "-" term}. 
term ::= factor {"*" factor | "/" factor}. 
factor ::= ?FloatingPointNumber | "(" expr ")".

| denotes alternative productions
{ … } denotes repetition (zero or more times)

Scala code for above example:




import scala.util.parsing.combinator._
class Arith extends JavaTokenParsers 
    def expr: Parser[Any] = term~rep("+"~term | "-"~term) 
    def term: Parser[Any] = factor~rep("*"~factor | "/"~factor) 
    def factor: Parser[Any] = floatingPointNumber | "("~expr~")" 
}


The parsers for arithmetic expressions are contained in a class that inherits from the trait JavaTokenParsers.

Steps for converting Context free grammar into code:

  1. Every production becomes a method, hence add a prefix ‘def’.
  2. The result type of each method is Parser[Any], so we need to change the ::= symbol to “: Parser[Any] =”.
  3. In the grammar, sequential composition was implicit, but in the program it is expressed by an explicit operator: ~. So we need to insert a ‘~’ between every two consecutive symbols of a production.
  4. Repetition is expressed rep( … ) instead of { … }.
  5. The period(.) at the end of each production is omitted, however one can use semicolons(;) too.

Test whether your parser works or not with the code below!




object ParseExpr extends Arith 
    def main(args: Array[String]) 
    
        println("input : "+ args(0)) 
        println(parseAll(expr, args(0))) 
    
}


The ParseExpr object defines a main method that parses the first command line argument passed to it. Parsing is done by the expression: parseAll(expr, input)

We can run the arithmetic parser with the following command:

$ scala ParseExpr "4 * (5 + 7)" 
input: 4 * (5 + 7) 
[1.12] parsed: ((4~List((*~(((~((5~List())~List((+ ~(7~List())))))~)))))~List())

The output tells us that the parser successfully analyzed the input string up to position[1.12]. That means the first line and the twelfth column or we can say the whole input string was parsed.

We could also check whether the parser works for wrong input and gives an error or not.
Example:

$ scala ParseExpr "2 * (3 + 7))" 
input: 2 * (3 + 7)) 
[1.12] failure: `-' expected but `)' found
2 * (3 + 7))            ˆ 

The expr parser parsed everything except the final closing parenthesis which does not form part of the arithmetic expression. The ‘parseAll’ method then issued an error message, saying that it expected a operator at the point of the closing parenthesis.



Last Updated : 21 Nov, 2019
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads