Wednesday, 26 October 2011

Left-recursion in Parsec

Lately I've been using the Parsec library for Haskell to write a parser and interpreter for a university assignment. Right-recursive grammars are trivial to parse with combinatorial parsers; tail recursion and backtracking make this simple. However, implementing a left-recursive grammar will often result in an infinite loop, as is the case in Parsec when using basic parsers.

Parsec does support left-recursion however. Unsatisfied with the lack of good tutorials when I googled for advice, I decided to write this. Hopefully it helps someone. If I can make this better or easier to understand, please let me know!

Left recursive parsing can be achieved in Parsec using chainl1.

chainl1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m (a -> a -> a) -> ParsecT s u m a

As an example of how to use chainl1, I'll demonstrate its use in parsing basic integer addition and subtraction expressions.

First we'll need an abstract syntax tree to represent an integer expression. This can represent a single integer constant, or an addition / subtraction operation which involves two integer expressions.

data IntExp = Const Int
            | Add IntExp IntExp
            | Sub IntExp IntExp 

If addition and subtraction were to be right-associative, we'd parse the left operand as a constant, and attempt to parse the right operand as another integer expression. Upon failing, we'd backtrack and instead attempt to parse an integer constant. Reversing this approach to make the expressions left-associative would cause infinite recursion; we'd attempt to parse the left operand as an integer expression, which attempts to parse the left operand as an integer expression, which tries to... you get the point.

Instead we use chainl1 with two parsers; one to parse an integer constant, and another which parses a symbol and determines if the expression is an addition or subtraction.

parseIntExp :: Parser IntExp
parseIntExp =
  chainl1 parseConstant parseOperation

parseOperation :: Parser (IntExp -> IntExp -> IntExp)
parseOperation =
  do spaces
     symbol <- char '+' <|> char '-'
     spaces
     case symbol of
       '+' -> return Add
       '-' -> return Sub

parseConstant :: Parser IntExp
parseConstant =
  do xs <- many1 digit
     return $ Const (read xs :: Int)

Here, parseOperation returns either the Add or Sub tag of IntExp. Using GHCi, you can confirm the type of Add as:

Add :: IntExp -> IntExp -> IntExp

So, we have a parser which will parse a constant and a parser which will parse a symbol and determine what type of operation an expression is. In parseIntExp, chainl1 is the glue which brings these together. This is what allows left-associative parsing without infinitely recursing.

A complete code sample is available here. The abstract syntax tree has been created as an instance of the Show typeclass to print in a more readable format, which shows that the grammar is indeed left-associative.

ghci>  run parseIntExp "2 + 3 - 4"
((2+3)-4)

Thursday, 13 October 2011

Another reason I like C#

A lot of people seem to be finding their way here because of my post about finding the maximal area submatrix in a binary matrix in C#. I found this post from a couple of weeks ago which shows the same idea in Haskell. It's interesting to see a similar idea presented in a different language; especially one as awesome as Haskell.

After a couple of hours of writing in C# earlier I've found a feature of the language that I really like: indexors. They allow array-style indexing of user-defined classes, which would be particularly useful for data structures. I really like that something which has typically just been syntactic sugar for array access is available to developers. Why? Because we're too lazy to write foo.get(x, y) when foo[x, y] is also available.