Lately I've been using the Parsec library for Haskell to write a parser and interpreter for a university assignment. Right-recursive grammars are trivial to parse with combinatorial parsers; tail recursion and backtracking make this simple. However, implementing a left-recursive grammar will often result in an infinite loop, as is the case in Parsec when using basic parsers.
Parsec does support left-recursion however. Unsatisfied with the lack of good tutorials when I googled for advice, I decided to write this. Hopefully it helps someone. If I can make this better or easier to understand, please let me know!
Left recursive parsing can be achieved in Parsec using chainl1.
Parsec does support left-recursion however. Unsatisfied with the lack of good tutorials when I googled for advice, I decided to write this. Hopefully it helps someone. If I can make this better or easier to understand, please let me know!
Left recursive parsing can be achieved in Parsec using chainl1.
chainl1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m (a -> a -> a) -> ParsecT s u m a
As an example of how to use chainl1, I'll demonstrate its use in parsing basic integer addition and subtraction expressions.
First we'll need an abstract syntax tree to represent an integer expression. This can represent a single integer constant, or an addition / subtraction operation which involves two integer expressions.
data IntExp = Const Int | Add IntExp IntExp | Sub IntExp IntExp
If addition and subtraction were to be right-associative, we'd parse the left operand as a constant, and attempt to parse the right operand as another integer expression. Upon failing, we'd backtrack and instead attempt to parse an integer constant. Reversing this approach to make the expressions left-associative would cause infinite recursion; we'd attempt to parse the left operand as an integer expression, which attempts to parse the left operand as an integer expression, which tries to... you get the point.
Instead we use chainl1 with two parsers; one to parse an integer constant, and another which parses a symbol and determines if the expression is an addition or subtraction.
parseIntExp :: Parser IntExp parseIntExp = chainl1 parseConstant parseOperation parseOperation :: Parser (IntExp -> IntExp -> IntExp) parseOperation = do spaces symbol <- char '+' <|> char '-' spaces case symbol of '+' -> return Add '-' -> return Sub parseConstant :: Parser IntExp parseConstant = do xs <- many1 digit return $ Const (read xs :: Int)
Here, parseOperation returns either the Add or Sub tag of IntExp. Using GHCi, you can confirm the type of Add as:
Add :: IntExp -> IntExp -> IntExp
So, we have a parser which will parse a constant and a parser which will parse a symbol and determine what type of operation an expression is. In parseIntExp, chainl1 is the glue which brings these together. This is what allows left-associative parsing without infinitely recursing.
A complete code sample is available here. The abstract syntax tree has been created as an instance of the Show typeclass to print in a more readable format, which shows that the grammar is indeed left-associative.
ghci> run parseIntExp "2 + 3 - 4" ((2+3)-4)