Learning Erlang

by Jesse Farmer on Thursday, May 1, 2008

Last week I decided to learn Erlang, a functional programming language developed by Ericsson in 1987 for use in telecommunications environments. It's probably the strangest non-toy programming language I've ever tried to learn, so I thought I'd share some of my realizations.

Variables vs. Atoms

First, variables in Erlang are not like variables as most programmers think about them. Fortunately for me they're a lot like variables as mathematicians think of them.

That is, variables in Erlang are either bound or unbound, and bound variables cannot be rebound in the same context. This means that variables are write-once.

If I declare Name = "Jim". I cannot later declare Name = "Betty". in the same context. Erlang will throw a matching error because it's trying to match the right-hand side, "Betty," against the left-hand side, which is bound to "Jim."

When the left-hand side is unbound any match will succeed and assignment will occur, but if the left-hand side is bound Erlang will try to match the right-hand side to the value of the bound variable. Thus, if "Jim" is bound to Name, both Name = "Jim". and "Jim" = Name. will succeed, but Name = "Betty". will fail. Weird, huh?

Second, "context" in Erlang means lexical scope. What's more, there is no global scope. This is to enforce a no-side-effects style of programming, I suppose.

Finally, variables in Erlang start with a capital letter. Always. That is, Var is always a variable but var is never a variable. If you execute var = 5. you'll get a matching error.

In this case var is treated as an atom by Erlang. Atoms satisfy the same role that symbols do in Ruby. Any literal that isn't another data-type, variable, or function is an atom.

Atoms usually start with lower-case letters but you can also denote atoms by enclosing the name in single quotes. So, Var is a variable but 'Var' is an atom. var is an atom, too, and never a variable.

Data Types

In addition to atoms there are other data types. All the favorites are here, like integers, floats, and strings. We also have Funs, or "functional objects," which are anonymous functions.

Erlang also has two basic compound data types: lists and tuples. These are analogues of the same objects in Python. Items in both lists and tuples are separated by commas, but lists are enclosed by brackets, [], and tuples by curly braces, {}.

For example, [1,3.4,true] is a list and {person, 25, "Jason"} is a tuple.

There are no booleans in Erlang. Instead the atoms true and false are used.

Assignment vs. Pattern Matching

In every other language I've ever used assignment works something like this: var x = 5. In Erlang there is no assignment, at least not in this sense. Rather, Erlang matches patterns and variables will match any pattern.

Consider the following (using the erl shell):

1> {ip, IP} = {ip, "192.168.0.1"}.
{ip,"192.168.0.1"}
2> IP.
"192.168.0.1"

Erlang is matching the left-hand and right-hand sides and trying to align them. IP is a variable (we know this because it starts with a capital letter), so it matches any pattern. ip is an atom (we know this because it starts with a lower-case letter).

In this case alignment is possible because ip matches on both sides and IP is bound to the value "192.168.0.1".

Now consider this:

1> {foobar, IP} = {ip, "192.168.0.1"}.
** exception error: no match of right hand side value {ip,"192.168.0.1"}

Here we get an error because foobar and ip are different atoms, making a match impossible. If instead we did

1> {Atom, IP} = {ip, "192.168.0.1"}.  
{ip,"192.168.0.1"}
2> Atom.
ip
3> IP.
"192.168.0.1"

Here there's no error because Atom is a variable. It is bound with a value of ip, which is an atom.

Here's a more subtle example.

1> {A, {B, C}} = {first, {second, third}}.
{first,{second,third}}
2> A.
first
3> B.
second
4> C.
third
5> {X, Y} = {first, {second, third}}.
{first,{second,third}}
6> X.
first
7> Y.
{second,third}

If you understand why A, B, C, X, and Y get bound to the values that they do then I think you're a long way towards understanding how = works in Erlang.

Looping vs. Recursion

Since variables are bound to their lexical scope it makes procedural-style looping in Erlang difficult. i++ is not only verboten, it is syntactically invalid.

Instead loops are done through recursion. Here is the factorial function:

-module(factorial).
-export([factorial/1]).

factorial(0) -> 1.
factorial(N) ->
	N * factorial(N-1).

Briefly, -module defines an Erlang module, which is the mechanism by which the language supports code separation. -export tells Erlang which functions in this module to export. The /1 after factorial on the export line is the function's arity.

As with variable assignment, Erlang uses pattern matching in defining functions. Since 0 is an integer literal, all instances of factorial(0) match it. Any other calls to factorial with a single argument match the second and N is bound to that argument.

Tail Recursion

Since iterative loops are difficult in Erlang making sure your recursive functions are tail recursive is important. This means the last call a recursive function should make it to itself.

The factorial function above is not tail recusrive — the last call it makes is to * rather than factorial.

To fix this we need to re-write factorial to make use of an accumulator.

-module(factorial).
-export([factorial/1]).

factorial(N) ->
	factorial(N,1).

factorial(0, Acc) ->
	Acc,
factorial(N,Acc) ->
	factorial(N-1, N*Acc).

Thanks to Erlang's pattern matching capabilities we don't even have to redefine the interface. We only export the factorial function that supports one argument.

The Future

Erlang is about concurrency and message-passing, so for my first exercise I'm going to try to create some simple network services.

Also, does anyone know of a GeSHi plugin for Erlang?