I’ve been reading about Haskell for a year or so, but it’s only lately that I’ve really started to do any coding in the language. It’s really good to finally write some code of my own and see what the compiler thinks about it.

I had to get a few things set up on my computer before I was motivated to jump in; one was the Haskell Platform, another was the Haskell plugin for Eclipse, and the third was the Gtk2Hs packages for hooking Haskell up to OpenGL. I wanted to do something with Haskell that I could see on the screen, so I started working with The Haskell School of Expression: Learning Functional Programming Through Multimedia. Gtk2Hs offers a graphics library *soegtk* that specifically supports, and maintains API compatibility with, this book.

Anyway I thought I’d relate some of what I’ve been going through. I hope that even if you know Haskell, there may be something of interest here.

I do have a bit of a leg up because I’ve become familiar with certain basic things through reading, but there’s nothing like actually getting down to it. Still, I had to go lie down for a while after reading Exercise 2.4:

Define a function convex :: Shape -> Bool that determines whether or not its argument is a convex shape (although we are mainly interested in the convexity of polygons, you might as well define it for each kind of shape).

OK so at least I should show how far the book had brought me along before hitting me with this. We started with some type definitions:

type Radius = Float type Side = Float type Vertex = (Float, Float)

These are just typedefs. They don’t enforce anything. **Radius** and **Side** are both actually **Float**, so if you mix them up, Haskell won’t care. They’re just convenient aliases for existing types that help you express your intent, as we’ll see in the definition of **Shape** below.

That last type, **(Float, Float)**, is a tuple: a pair of Floats. A tuple has a fixed length, and a fixed type in each position.

data Shape = Rectangle Side Side | Ellipse Radius Radius | RtTriangle Side Side | Polygon [Vertex]

Unlike **type**, the **data** keyword does define a new type. This type is named **Shape**, and the things separated by the **|** symbols — **Rectangle**, **Ellipse**, **RtTriangle**, and **Polygon** — are constructors.

In Java, C++, and ActionScript, constructors must have the same name as the type they construct. In these object-oriented languages, types are realized as classes. In ActionScript, in fact, the constructor is so closely identified with the class that a class can have only one constructor. (This is true of haXe as well — but in haXe the constructor must be named **new**.) In Haskell, one constructor may have the same name as the type it constructs, but that wouldn’t be useful here because we’re not interested in constructing an abstract Shape. (I don’t know enough about Haskell to say that we would *never* want to do that, though.)

The type names following the constructor constitute its signature; thus **Rectangle** takes two parameters of type **Side**, and so on. Haskell doesn’t use parentheses to indicate function application. So clean.

That last type, **[Vertex]**, is a list of Vertices. Unlike a tuple, a list can have only one type of element, but it may have any number of elements, from none (the empty list **[]**) to infinity (for example the list of positive integers **[1..]**).

The next thing I want to mention, though I didn’t hear it from the SOE book, is this: constructors don’t have to have any parameters.

So what does that look like? How about this:

data Bool = True | False

And there you have it. Haskell has no need of a special Enum type or keyword. It’s just part of the system.

If you’re a Flash programmer who’s been looking at Enums in haXe you may have been bedazzled by the fact that the enums have constructors and the constructors can have parameters. Well, we’ve just seen that little mashup from the FP side of things.

Moving back to geometry, we now define our first function. The exercise is: “define **rectangle** in terms of **Polygon**.” (Types and constructors start with an upper-case letter; functions start with a lower-case letter. This is not just a convention.) First we’ll declare the type of the function:

rectangle :: Side -> Side -> Shape

OK this looks pretty weird from the imperative-language perspective. What it says is that rectangle takes two arguments of type Side and returns a Shape. But the -> symbol appears to have two different uses: separating the argument types from each other, and separating the argument types from the return type. (It’s like that in haXe, too!) We’re going to explain that little quirk after a while, but for now let’s just put up a stickie and push on.

rectangle s t = Polygon [(x, y), (-x, y), (-x, -y), (x, -y)] where x = s/2 y = t/2

This is pretty much plain English or plain Mathish anyway, so I’ll just note that in Haskell it’s considered cool to use one- and two-letter variable names.

You should now be able to define the functions **square** and **circle**.

How about another function:

area :: Shape -> Float

There are four different ways a Shape can be constructed, and they all have different formulas for calculating the area, so how do we write this function? You might think we’d have to say something like

area s = if s is a Rectangle then bla bla bla...

But no, what we do in Haskell is provide several equations for the function, with different parameter specs. When it’s time to execute the function, Haskell will go through the equations and use the first one that matches the actual arguments.

area (Rectangle s1 s2) = s1 * s2 area (RtTriangle s1 s2) = s1 * s2 / 2 area (Ellipse r1 r2) = pi * r1 * r2

The pattern matching at work here is just one of the ways that the flow of control is built into the language. In many ways, the programmer takes care of getting the data structures right, and the compiler handles the branching, the looping, … the CONCURRENCY… In this case we’ve seen Haskell take care of some branching for us.

Now we also want the area function to handle the last kind of Shape, the Polygon. Polygon is defined by a list, so before we can write that equation we have to back up a bit and take a digression about lists, how they’re represented, and how we can operate on them. This is going to show us how we can implement loops in a language lacking **for** or **while** — well, it’s no secret really, we use recursion.

In the **rectangle** definition above, we constructed an explicit list of four Vertices. When we pass a list into a function, we’re going to want to have a way to deconstruct it. So let’s introduce the **:** operator, pronounced “cons.” If **xs** is a list and **x** is another element, then **x:xs** is a new list whose head (first element) is **x** and whose tail (everything after the first element) is **xs**.

head (x:xs) == x tail (x:xs) == xs

**x** and **xs** are very common variable names in Haskell. **xs** is pronounced “exes.” A list of **xs**.

These next few examples are from Real World Haskell.

-- square every element in the list squares :: [Float] -> [Float] squares (x:xs) = x * x : squares xs squares [] = []

This function is complete, because the second pattern matches the empty list, and the first pattern matches any other list of Floats. Notice that **x:xs** matches when the list is just **[x]** — in this case **xs** is the empty list **[]**.

x:[] == [x]

Optimization nerd alert: recurring on the tail results in what? Tail recursion! Which gives me the opportunity to mention another thing that Haskell (and other functional language implementations) do for you: they optimize tail recursion to run in constant space. The recursion doesn’t grow the stack like it does in your imperative language. So we’re cool there.

upperCase :: String -> String upperCase (x:xs) = toUpper x : upperCase xs upperCase [] = []

By the way,

type String = [Char]

, so **squares** and **upperCase** have exactly the same structure. What’s different? The type of the list elements, and the function that’s applied to each element. Let’s generalize **squares** by extracting as a parameter the function that it applies:

-- f every Float in the list ff_it :: (Float -> Float) -> [Float] -> [Float]

**ff_it** is a function that takes two arguments:

1. a function that takes a **Float** and returns a **Float**, and

2. a list of **Float**s;

and returns a list of **Float**s.

ff_it f (x:xs) = f x : ff_it f xs ff_it _ [] = []

The underscore in the last pattern is a wildcard; if we fall through to this pattern we don’t need to match the f, because we won’t be using it anyway.

SO: if we define a function **square**

square :: Float -> Float square x = x * x

then instead of using our special **squares** function we can say

-- square every Float in the list: squared_xs = ff_it square xs

and similarly

negate :: Float -> Float negate x = -x -- negate every Float in the list: negated_xs = ff_it negate xs

Of course it gets tedious to write **ff_it** this and **ff_it** that all the time, so let’s bring back our **squares** function by defining it this way:

squares = ff_it square

Where’s the **xs**, though, right? We defined **ff_it** to take two arguments…

…

…OK so now we have to go back to the arrows in the function signatures and come clean about what they really mean.

Here’s the thing: every function in Haskell takes just one argument.

The **->** operator is right-associative. The signature

multiply :: Float -> Float -> Float

is identical to

multiply :: Float -> (Float -> Float)

which we already know means “a function that takes a Float and returns (a function that takes a Float and returns a Float).”

multiply x y = x * y add x y = x + y

As long as we supply at least one argument, any Haskell function will return something.

double :: Float -> Float double = multiply 2 increment :: Float -> Float increment = add 1

This is called *partial application*.

Can you define a Haskell function that takes no arguments? I don’t think so. Certainly there’s no function that just “does something” without returning anything. There’s no “Void” type for it to not return. (By the way, this is why in AS3 they changed **Void** to **void**: it’s not a type, it’s the lack of a type.)

Now let’s generalize **upperCase** the same way we generalized **squares**:

-- f every Char in the list fc_it :: (Char -> Char) -> [Char] -> [Char] fc_it f (x:xs) = f x : fc_it f xs fc_it _ [] = []

And we now have two functions that are structurally the same but differ in only one thing: that type that appears as the argument and return types of f and as the element type of the list.

It would be great if we had variables that represent types, right?

-- f every a in the list f_it :: (a -> a) -> [a] -> [a] f_it f (x:xs) = f x : f_it f xs f_it _ [] = []

Using our now-uncloaked knowledge of the -> operator, this is how we read the type declaration:

**f_it** is a function that takes:

.. a function that takes some type **a** and returns the same type;

and returns:

.. a function that takes a list of that same type **a** and returns a list also of that same type.

It’s customary to use letters from the beginning of the alphabet for type variables.

squares = f_it square upperCase = f_it toUpper

There’s one more generalization of **f_it** that’s cool to make that wasn’t revealed by **squares** and **upperCase**, because they return the same type as their last argument (talking about it in the old pre-enlightenment way). But they could return a different type, say **b**, and we’d be covered by this more general and equally strongly-typed definition:

map :: (a -> b) -> [a] -> [b] map f (x:xs) = f x : map f xs map _ [] = []

(Of course this includes the case where **a** and **b** are the same.)

Whew, all that just to get to the **map** function that’s in the standard library and that you probably know about already in some other language! Too fast?

So clean. Just look at what a hash ActionScript makes of it: AS3 function map(callback:Function, thisObject:Object = null):Vector.<t>

HaXe does a somewhat better job : Functional haXe

So I hope you noticed that

a. Every function has no side effects either in or out; its action can’t change or be changed by anything external. Its return value depends solely on its inputs. They’re called pure functions.

b. no variable ever changes value. They’re immutable.

Ya think that might be thread-safe? Ah, the concurrency!

Now no useful language can have only pure functions; there would be no way to do any IO! So Haskell does have impure functions, but it makes it very clear where the membrane lies. In fact you can tell an impure function from its signature alone: it will always have the word “IO” in it!

I see I haven’t gotten back to the **convex** function, or even to the area of a Polygon, but I’m afraid our time is up for this week. If you thirst for more of this saga, leave a comment begging for it.

PS If you had any trouble with the **square** and **circle** definitions, here they are:

square s = Rectangle s s circle r = Ellipse r r

Pingback: A Year Of Scala « blog.joa-ebert.com – Blog of Joa Ebert