Today’s guest post comes from Tom Ellis. If you haven’t heard, Tom released a novel library for interacting with relational databases this year - Opaleye. While similar to HaskellDB in some respects, Opaleye is distinct in its extensive use of arrows in order to guarantee safety of queries. In this post, Tom’s going to guide us through GHC’s special syntax support for the
Arrow type class.
In Haskell we use the
Monad typeclass to provide an interface for encoding certain forms of computation. The
Arrow typeclass provides an interface that is in many ways similar. Everything that can be made an instance of
Monad (in a law-abiding way) can also be adapted to be an
Arrow (in a law-abiding way). This means that the
Arrow interface is less powerful than
Monad, but that more things can be made instances of
Working directly with
Monad combinators like
>> can sometimes be awkward so Haskell provides “
do notation” as an alternative way of writing monad expressions in a way that can often be clearer. Likewise there is an “arrow notation” for writing arrow expressions. It is not enabled in GHC by default but can be turned on with the
Arrows language extension.
The subject of this blog post is how to understand and use arrow notation. It will also serve as an introduction to the
Arrow typeclass whilst keeping mention of the arrow combinators to a minimum. We will use develop our intuition for arrow computation rather than learn any technical and formal definitions!
Let’s kick off by refreshing our memory of
do notation. A very basic way to think of
do notation is that it is similar to a sequence of let bindings. For example, a basic Haskell expression to perform a numerical calculation might be
do notation supports expressing the exact same computation inside the
Identity monad, that is, a monad that has no “side effects”.
let bindings in
do notation. (For technical reasons we have to wrap every intermediate expression in
return to lift them into the
Identity monad.) Arrow notation supports a similar translation:
In arrow notation
proc plays the part of “lambda”, i.e. backslash,
<- plays the part of
-< feeds an argument into a “function”. We use
returnA instead of
The benefit of
do notation comes when we want to encode a computation that can’t be written using pure
let bindings alone. Here’s an example that uses the list monad to generate all coordinates within a given radius of the origin:
We read this as
xin -5 to 5
yin -5 to 5
x*x + y*y <= r*r
Now let’s see how to use arrow notation to express the same computation. For trivial technical reasons we need to wrap the list monad to make it suitable for use with arrow notation. The wrapping and unwrapping don’t actually do anything except shuffle some type parameters around. In arrow computations we will use
K  a b where instead of
a -> [b]. We’ll use abbreviated versions of the relevant wrapping and unwrapping functions:
Then we can use arrow notation to implement the radius list computation as follows:
What’s the point of arrow notation? So far we have only seen that it is able to replicate some examples in
do notation. Well, the point is that arrow notation forbids some computations that
do notation allows. In particular all “arrow actions” must be “statically” known“. That sentence was a mouthful! What does it mean? I am calling the expression that comes between
-< in a row of arrow notation the”arrow action“.”Statically known" means that if we have a couple of rows of arrow notation
then the expression
action2 cannot depend on
x or indeed anything bound on the left hand side of an arrow notation row.
This restriction has important practical consequences. For example, our Haskell IO system might be based on the following primitives
then we could use
do notation to write a procedure which reads a line of user input and either prints something out or writes to a file based on that input.
procedureM :: String -> IO () procedureM = \prompt -> do input <- getLineM prompt if input == "Hello" then printM "You said 'Hello'" else writeFileM ("/tmp/output", "The user said '" ++ input ++ "'") -- ghci> procedureM "Say something" -- "Say something" -- Hello -- "You said 'Hello'" -- ghci> procedureM "Say something" -- "Say something" -- Bye bye -- (Writes to /tmp/output)
However, there is no way to express this in arrow notation using only the same primitives.
Why do we want to use arrows when they have these restrictions? Going into details would take a whole other blog post of its own, but I will mention briefly a few places where the full generality of monads is too much.
Firstly, an arrow-only interface can often allow you to take advantage of opmizations that a monadic interface could not. For example parsers written using parser combinators can be made more memory efficient if we know statically the parsing action they are going to perform. Similarly it can help to reduce the chance of memory leaks in functional reactive programming (e.g. with
netwire) if actions cannot depend in an unrestrained way on the result of previous actions.
In embedded domain specific languages this forcing non-dependence can make code generation more easily match a target language. For example in the SQL-generating relational query language Opaleye queries are built up from arrows called
QueryArr. Using an arrow rather than a monad allows the semantics of the domain specific language to more closely match the semantics of the underlying SQL language.
So in summary, arrows are a Haskell abstraction, similar to monads, and arrow notation is a way of writing arrow expressions which is similar to to
do notation. However, there are some restrictions on what you can do with arrows that are not shared by monads. The benefit of the restriction is you can often receive a performance benefit or use your more specific knowledge about the structure of an arrow computation to your advantage.
This post is part of 24 Days of GHC Extensions - for more posts like this, check out the calendar.