Demystifying Pratt Parsers

TL;DR

Here’s the algorithm for pratt(precLimit):

Here’s a minimal commented Pratt parser example in Elm.

Introduction
Bob Nystrom’s article
Overview
Visual intuition
Walk-through example
Elm implementation
Loose threads
Conclusion

Introduction

Pratt parsers are a beautiful way of solving the operator precedence problem:

How can an expression like 1+2-3*4+5/6^7-8*9 be parsed to meet the expectations of your PEMDAS-trained brain? Where do you put the parentheses? What goes first?

The underlying problem is the frequent ambiguity in combinations of operators. For example, how should the expression 1+2*3 be interpreted?

There are two options, each leading to a different value.

Ambiguity of multiple binary operators

There are a few algorithms to solve this problem. Among them, my favorite is the Pratt parser—the most intuitive and declarative solution in my view.

Pratt parsers are an algorithm that takes your tokens ([1, +, 2, *, 3]) and a table like:

Operator	Precedence
+	1
-	1
*	2
/	2
^	3

and spit out the correctly parenthesized/nested expression (1 + (2 * 3)). So nice!

But how do they work?

Bob Nystrom’s article

The original paper “Top Down Operator Precedence” (PDF or this beautiful HTML rendering) by Vaughan Pratt has a lot of theory, theorems and proofs, is somewhat philosophical at times, and spends quite some time dealing with problems of its time (1973).

Of course, as an academic paper it has a different target audience, so let’s not hold that against it. It also makes sense for it (introducing a new idea) to discuss motivation. But that makes it all the harder to find the important parts in the paper.

And I suspect that’s the reason people write summaries. You don’t need the 1973 motivation, you just need to solve the operator precedence problem!

The best summary and explanation of Pratt parsers I know of (at least for the audience of an amateur programming language designer in 2023) is written by Bob Nystrom in his blogpost “Pratt Parsers: Expression Parsing Made Easy”. You should definitely read it! (Along with other Bob’s writings, like Crafting Interpreters.)

The examples Bob gives are written in Java, and the full example implementation is split across multiple small classes.

I feel that we can do a bit better pedagogically—it can be tricky making sense of the idea when spread out like this, and it can also be challenging to port the idea from OOP to a FP language.

Not that you’d be unable to make sense of the Java code, but I see pedagogical value in trying to simplify even more.

So, that’s my motivation for writing this article! Let’s try and see how would a more minimal and concise implementation look in Elm!

Spoiler: here it is.

The language is pure, immutable, has no escape hatches, has no spooky action at a distance (eg. default arguments or exceptions) and no obscure features (except for perhaps sum types, but even those are thankfully getting more and more common in other languages).

Hence, if you can write it in Elm, you can likely write it in anything else.

Overview

The idea of Pratt parsing is very simple:

Each operator has a precedence (a number) associated with it. Larger number means higher priority, eg. multiplication has higher priority than addition.

The Pratt parser holds a precedence limit. It can be thought of as: “You’re only allowed to parse operators with precedence higher than this number.”

The precedence limit of the top-level expression is 0. Thus you could define your expr(tokens) parser as pratt(0,tokens).

The Pratt parser first parses a prefix expression (the left part of the Op(left, op, right)), then starts a loop.

In the loop it looks at the next token. If there are no tokens left, or if it’s not a known binary operator, or if it is but its precedence is equal or lower than the limit, the parser ends and returns the left expression, alongside an information about what token to look at next (= whatever’s immediately after the left expression).

If the token is a known binary operator with higher precedence than the limit held by the Pratt parser, the parser spawns another Pratt parser with a limit equal to the operator’s precedence.

Whatever that inner Pratt parser returns (the right part of the op), the outer one combines into Op(left, op, right) which becomes the left for the next loop.

That way the Pratt parser manages to consume all operations above its limit. A Pratt parser consuming * and / will also consume ^ into a (parenthesized) child via the nested pratt(...) calls, but will never touch + and -.

…Well, that was a mouthful. Here it is in the flowchart form again:

I wouldn’t blame you if it didn’t click right away! To aid understanding, let’s examine what the algorithm does from a few different perspectives: visually, then reading through an execution trace, and then finally the code itself.

Visual intuition

Here’s the expression we’re interested in parsing:

Original expression

If we focus on adding parentheses (instead of tree creation), we want to end up with something similar to the following:

Expression with added parentheses

Let’s assign the precedence values to the operators:

Operator	Precedence
+	1
-	1
*	2
/	2
^	3

Now let’s put operators on separate lines according to their precedence:

Separated

In order to get some visual intuition on this problem, we’ll repeat a simple process on each level: given a region in a level above, we’ll draw regions between the operators on our current level:

Process we'll follow

At the very top (p=0) we’ll start with one large region:

p=0

One level below (p=1), we split the region whenever we encounter an operator on our level—that is, + or -.

p=1

Continuing on to p=2, we’ll split each blue region when we find * or /.

To make the diagram less cluttered, we’ll skip processing regions with single numbers, eg. here 1 and 2.

p=2

And then finally on level p=3 we split the region containing the ^.

p=3

By now perhaps you already see what’s happening: each blue region represents a set of parentheses!

Let’s clean the diagram up a little before we show that off: parentheses with no operators inside aren’t very helpful (eg. (3) vs 3), so we’ll remove the blue regions that only hold single numbers:

Cleaned up

And now we can finally replace each beginning of a blue region with ( and each end with ):

Converted

Merging each level into the one above, we end up with the parenthesized expression we wanted:

Merged

What we have been doing visually, corresponds to the Pratt parser algorithm: the blue regions on level p=n are calls to the pratt(n) parser. (Except for the leftmost numbers of each block, which are handled by the prefix parser.)

Let’s now walk through the algorithm on an example to see that. (I swear there’s going to be some code in this blogpost eventually.)

Walk-through example

The source we’re trying to parse is 1+2-3*4+5/6^7-8*9.

It corresponds to a sequence of tokens:

[ 1, +, 2, -, 3, *, 4, +, 5, /, 6, ^, 7, -, 8, *, 9 ]

The precedences are as follows:

Operator	Precedence
+	1
-	1
*	2
/	2
^	3

In this simplified example, there are no unary operators. We’ll talk about those near the end of the blogpost.

Step through the execution to see what’s happening:

Oh no, the gallery didn't load!

Elm implementation

Hopefully you noticed the parallels between the visual interpretation and the flowchart’s recursive descent down the stack frames.

Each Pratt parser is able to handle a single precedence level and delegate the higher levels to child Pratt parsers, while refusing to do anything for lower levels.

Well, let’s formalize what we just went through!

First, let’s define the types we’ll work with.

type Token
    = TNum Int
    | TOp Binop

type Binop
    = Add
    | Sub
    | Mul
    | Div
    | Pow

type Expr
    = Num Int
    | Op Expr Binop Expr

Our token list and operator precedence table:

exampleTokens : List Token
exampleTokens =
    [ TNum 1
    , TOp Add
    , TNum 2
    , TOp Sub
    , TNum 3
    , TOp Mul
    , TNum 4
    , TOp Add
    , TNum 5
    , TOp Div
    , TNum 6
    , TOp Pow
    , TNum 7
    , TOp Sub
    , TNum 8
    , TOp Mul
    , TNum 9
    ]

precedence : Binop -> Int
precedence binop =
    case binop of
        Add -> 1
        Sub -> 1
        Mul -> 2
        Div -> 2
        Pow -> 3

And now, the fun begins.

parse : List Token -> Maybe Expr
parse tokens =
    case pratt 0 tokens of
        Just ( expr, tokensAfterExpr ) -> Just expr
        Nothing -> Nothing

The top-level parser calls pratt 0 and then throws away the tokens it returns (only keeps the parsed expression).

We’ll get some preliminaries out of the way:

prefix : List Token -> Maybe (Expr, List Token)
prefix tokens =
    case tokens of
        []               -> Nothing
        (TNum n) :: rest -> Just ( Num n, rest )
        (TOp  _) :: _    -> Nothing

Note that in a more advanced language you’d probably have prefix operations like negation (-5) or explicit grouping via parentheses ((5+3)). We’ll talk about these near the end of the blogpost.

pratt : Int -> List Token -> Maybe (Expr, List Token)
pratt precLimit tokens =
    -- First parse a prefix expression
    case prefix tokens of
        Nothing -> Nothing

        Just ( left, tokensAfterPrefix ) ->
            prattLoop precLimit left tokensAfterPrefix


prattLoop : Int -> Expr -> List Token -> Maybe (Expr, List Token)
prattLoop precLimit left tokensAfterLeft =
    -- The next token is an operator! Let's find its precedence.
    case tokensAfterLeft of
        -- Operator!
        (TOp op) :: tokensAfterOp ->
            let
                opPrec : Int
                opPrec = precedence op
            in
            -- Now, are we allowed to parse the next expression
            -- or is it outside the limit?
            if opPrec > precLimit then
                -- We can parse it! Spawn a child Pratt parser.
                case pratt opPrec tokensAfterOp of
                    -- Whatever the child Pratt parser did,
                    -- we take it and combine it with our `left`.
                    Just ( right, tokensAfterChild ) ->
                        let
                            newLeft : Expr
                            newLeft = Op op left right
                        in
                        -- There might be more on our level
                        -- (like in 1+2-3), so let's loop.
                        prattLoop
                            precLimit
                            newLeft
                            tokensAfterChild

                    -- An error, propagate it.
                    Nothing -> Nothing

            else
                -- We shouldn't parse this op.
                -- Return what we have.
                -- (Note our token list points at the op,
                -- not at the token after.)
                Just ( left, tokensAfterLeft )

        -- Either we ran out of tokens or found something
        -- that's not an operator. Let's return what we have.
        _ -> Just ( left, tokensAfterLeft )

And that’s all!

Running this—either in the elm repl or in the browser with something like:

module Main exposing (main)

import Html

main =
    parse exampleTokens
        |> Debug.toString
        |> Html.text

—will result in:

Just (Op (Op (Op (Op (Num 1) Add (Num 2))
                 Sub
                 (Op (Num 3) Mul (Num 4)))
             Add
             (Op (Num 5)
                 Div
                 (Op (Num 6) Pow (Num 7))))
         Sub
         (Op (Num 8) Mul (Num 9)))

as you can see in the Ellie with the full source.

This can with some elbow grease be translated into:

((((1 + 2) - (3 * 4)) + (5 / (6 ^ 7))) - (8 * 9))

and if you remove parentheses around pluses and minuses, you’ll get our wanted result:

1 + 2 - (3 * 4) + (5 / (6 ^ 7)) - (8 * 9)

Parsing successful!

Loose threads

The above illustrates the core idea Pratt parsers on left-associative binary operators, but that isn’t enough for real-world languages. There are a few unanswered questions. Let’s answer them:

Right-associativity

What about right-associativity? ^ is right-associative, meaning 2^3^4 should parse into 2^(3^4).

Thankfully this is simple to solve: each operator will now have an isRightAssociative boolean associated with it in addition to the precedence level.

When the pratt function gets that boolean, it will calculate the precedence this way:

opPrec : Int
opPrec = precedence op

finalPrec : Int
finalPrec =
    if isRightAssociative op then
        opPrec - 1
    else
        opPrec

Then you need to use opPrec for the condition and finalPrec for the recursive call:

if opPrec > precLimit then
    case pratt finalPrec tokensAfterOp of
        ...

And that’s it!

Here’s a runnable example with right associativity, parsing 1^2^3+4 into (1^(2^3))+4: Ellie.

More complex prefix expressions

In our example we only had numbers as the prefix expressions. What if we wanted negation and parentheses?

Let’s add a few new tokens to illustrate:

type Token
    = TNum Int
    | TOp Binop
    | TLeftParen  -- new
    | TRightParen -- new

We’ll reuse TOp Sub for negation; normally you’d have tokens like Minus and Plus rather than assigning them meaning during lexing.

This is how the new prefix parser would look:

prefix : List Token -> Maybe (Expr, List Token)
prefix tokens =
    case tokens of
        [] -> Nothing
        (TNum n) :: rest -> Just ( Num n, rest )

        -- negation, consumes two tokens
        (TOp Sub) :: (TNum n) :: rest ->
            Just ( Num (negate n), rest )

        -- left paren: consumes '(', an expression, ')'
        TLeftParen :: rest -> groupedExpr rest

        -- right paren: doesn't make sense on its own
        TRightParen :: _ -> Nothing
        (TOp  _) :: _ -> Nothing

groupedExpr : List Token -> Maybe (Expr, List Token)
groupedExpr tokens =
    -- Has consumed TLeftParen already
    case pratt 0 tokens of
        Nothing -> Nothing
        Just (expr, tokensAfterExpr) -> 
            case tokensAfterExpr of
                -- A closing paren must follow
                TRightParen :: rest -> Just (expr, rest)

                -- Unbalanced parens!
                _ -> Nothing

Postfix expressions

How to make Pratt parsers support postfix operators, like i++ or 5!?

These are a special case of infix operators:

Postfix expressions need the prefix expression parsed and the operator consumed, but they don’t need to consume any right part. Thus there would be one less pratt call for the right-hand side.

Postfix expressions are admittedly where I simplified my explanations too much. The Pratt parser really cares about two classes of operators: prefix, and anything else.

In the above Elm code I’ve baked the assumption we’ll only deal prefix and infix (binary), into the pratt function itself. In reality each operator token would have a parser associated with it that deals with whatever’s after the operator, and the pratt function would call that.

{-|
Example: 5+3

Token expectations:

    expr op expr
    ^^^^^^^ already parsed
-}
binaryOp : Expr -> Binop -> Int -> List Token -> Maybe (Expr, List Token)
binaryOp left op precedence tokensAfterOp =
    -- what we had before in `pratt`, just extracted into its own function
    case pratt precedence tokensAfterOp of
        Nothing -> Nothing
        Just (right, tokensAfterRight) ->
            Just (Op left op right, tokensAfterRight)

which then gives us the freedom to write postfix parsers. With some handwaving they could look like:

{-| 
Example: 5!

Token expectations:

    expr Bang
    ^^^^^^^^^ already parsed
-}
postfixBang : Expr -> Binop -> List Token -> Maybe (Expr, List Token)
postfixBang left op tokensAfterOp =
    -- We're guaranteed to have parsed the `!` already
    -- Nothing to do here!
    Just (UnaryOp Factorial left, tokensAfterOp)

Conclusion

Pratt parsers are nice in that they let you define a table of operators and their precedence levels, which they then work through: the traversal itself is abstracted away.

Pratt parsers can be shrouded in mystery (particularly if you encounter terms like nud and led while reading about them or using a library that didn’t care to find better names).

With that in mind I hope I have demystified this technique a little bit and made you consider using or writing one. Life’s too short to deal with operator precedence in an ad-hoc way!

FP pattern - list of TODOs

How languages work #1: String interpolation

Martin Janiczek