Thursday, June 12, 2008

Let There Be Local Variables

Actually, there aren't any. There are only function parameters. But there's a workaround: use functions.



Suppose you're writing a token analyser for some input. 'token-generator is a generator function. You need to determine the type of a token - return 'string if you think it's a string, 'char, 'int, etc. Naïvely, you might



(def token-type (token-generator)
(get-token-type (token-generator)))

(def get-token-type (tok)
(if (is (tok 0 #\")) 'string
(is (tok 0 #\#) 'char)
(is-digit (tok 0)) 'int))


So you get your input from 'token-generator, but you're really interested in 'tok. So the function 'get-token-type provides the local variable 'tok for you to work with. The disadvantage is that your namespace gets polluted with a zillion little functions that are only useful to their immediate neighbours. Wouldn't it be nice to, like, click Refactor > Inline Method like you can in a real language?



Well, you can.



(def token-type (token-generator)
((fn (tok)
(if (is (tok 0 #\")) 'string
(is (tok 0 #\#) 'char)
(is-digit (tok 0) 'int))) (token-generator)))


The ( (fn (tok) (if blah)) (token-generator) ) is a call to an anonymous function that declares the 'tok parameter for you to work with.



Now your namespace is no longer polluted, but you have this damned ugly thing in the middle of your function. No ordinary human is expected to read this.



Enter 'let



'let is a macro that transforms its humanly-readable input into something resembling the mess above.



(def token-type (token-generator)
(let tok (token-generator)
(if (is (tok 0 #\")) 'string
(is (tok 0 #\#) 'char)
(is-digit (tok 0) 'int))))


'let lets you pretend you are actually declaring a local variable. Coming from another language, you might think that this is, really and truly, a local variable. It's not. Behind the scenes, it's an anonymous function creating a lexically scoped 'tok parameter. But you don't care. You have your local variable.



Here is the pattern:



(let a b stuff)


becomes:



((fn (a) stuff) b)



'with is a more general purpose macro that does the same thing, but for multiple variables.



(with (a 1 b 2 c 3) stuff)


becomes:



((fn (a b c) stuff) 1 2 3)


'let exists because in the special case of binding just one variable, we don't need parens to separate the variable bindings from the body of the code. Just because you write in lisp doesn't mean you actually like parens ...

Thursday, June 5, 2008

To Do: If Only ...

One of the simplest and most pervasive macros in arc is "do". "do" is useful when you need to evaluate a sequence of expressions, but you only have room for one. A common case is when evaluating conditionals:



(if a b
c d
e)

This corresponds to the following in java:
if (a) { 
return b;
} else if (c) {
return d;
} else {
return e;
}


In java, we can execute as many statements as we desire within a { } block; arc however allows only a single expression as return value. So if we wanted



if (a) { 
b();
c();
return d;
} else if (e) {
return f;
} else {
return g;
}


we can't just write



(if a b c d
e f
g)


Hence, do.



(if a (do b c d)
e f
g)


do expands into a call to a no-arg function containing b c d -



((fn () b c d))


- it defines an anonymous zero-arg function and immediately calls it. Hence, b c d are grouped in a single expression, and will work as desired inside a conditional:



(if a ((fn () b c d))
e f
g)

Thursday, May 29, 2008

arc functions

So the long and the short of it is, we get arc to evaluate expressions. Atoms mostly evaluate to themselves: strings, characters, and numbers.

arc> "foo"
"foo"
arc> 12
12

Symbols are also atoms, but have the special property that they may be defined to have some other value. Quoting a symbol means it doesn't get evaluated.

arc> 'bar
bar
arc> (set bar "toto")
"toto"
arc> bar
"toto"

Lists are another story: when arc evaluates a list, it assumes the first item of the list is a function, and the remainder are arguments.

arc> (+ 1 2)
3
arc> (expt 9 2)
81

The first item is not necessarily a function: it might also be a reference to a macro, or in anarki, an object for whose type a function has been prepared via defcall. For another day. The first place in a list is usually referred to as "functional position". So, the procedure for evaluating (a b c d), is to evaluate each of a, b, c, and d (order is unspecified). a should evaluate to a function, which is then called with the values of b, c and d as arguments.

But, the object in functional position doesn't have to be a symbol pointing to a function. It could also be an inline, anonymous function.

arc> ((fn (a b) (+ a b)) 23 67)
90

Disadvantage: it's dead ugly, and totally unreadable. Nest a few of these and you have parenthesis soup. It turns out this is one of the fundamental organising principles of arc code. So why bother?

  • group expressions

  • declare local variables

  • iteration

  • all sorts of wonderful things

- all this in a neat and concise way, without namespace clutter or parameter hell. Except, we need macros to make it happen.

Macros: a future post.

Thursday, May 22, 2008

Lexical Scope and Closures

Lexical scope is not a radical departure from java: the concept already exists. When we create an inner instance class in java, the inner class has access to all the instance variables of all enclosing classes, and even to local variables if it's an anonymous inner class. Similarly, in arc, a locally-defined function has access to all the variables of all enclosing functions. This set of variables and their values is called the closure of the function.




(def powerfn (y) (fn (x) (expt x y)))

(set square (powerfn 2)) ; --> square is (fn (x) (expt x y)) with y = 2
(set cube (powerfn 3)) ; --> cube is (fn (x) (expt x y)) with y = 3

(square 5) ; --> 25
(cube 5) ; --> 125


In this example, the variable y retains its value - 3 or 2 - long after powerfn has exited, in the closure of the function returned by powerfn (the function that is assigned to square and cube)



When rainbow finds a locally-defined function in the course of evaluation, it creates a Closure object which encapsulates the function in question, and its closure. The Closure thus created can subsequently be stored in a variable, and invoked anywhere at any time. The implementation isn't optimal however: it keeps the entire closure, rather than just the closed variables that are actually used by the function in question.

Thursday, May 15, 2008

Tail-Call Optimisation

Tail-Call Optimisation lets you write




public void forever(long n) {
System.out.println("looking for the last number: " + n)
if (isTheLastNumber(n)) {
return;
}
forever(n + 1);
}


- and never get a StackOverflowError. Yes. Never. When a recursive call to forever exits, there is nothing left for the parent invocation of forever to do, so its stack frame can be discarded, and if it is, the stack simply doesn't grow.



Here's an arc example:




(def a ()
(b)
(e))

(def b ()
(c)
(d))

(def e ()
(f)
(g))

(def g ()
(h)
(i)
(j))


Remeber, the return value of a function is the value of the last expression of the function. So the return value of (b) is (d).



After the call from b to (c) returns, execution continues with the call from b to (d). However, when (d) returns, b has nothing more to do, so it can return directly to a.



So, when we call c, we tell it to return to b at Instruction 2. However, when we call d, instead of telling it to return to b, we tell it to return to a. The stack frame representing b at this point is no longer referenced: the continuation for d is pointing back to a, bypassing b altogether.



Ultimately, by the time we get to the call to j, we will tell j to return to whoever a's caller was, because neither a, nor e, nor g, make use of the value of (j). Presumably, a's caller does.



I hope this is making sense.



Here is the implementation in rainbow. Inside the current continuation, the variable f is the function we're evaluating, and f.nth(i) gives us the nth expression of that function. To interpret the nth expression, rainbow does (approximately) this:




ArcObject expression = f.nth(index++);
Interpreter.interpret(f.last(index) ? caller : this, expression);


So if I'm on the last expression, I tell the interpreter to return to my caller instead of to me once it's finished. Mission accomplished.



See wikipedia's tail-recursion article for more information. btw, rainbow doesn't do TCO modulo cons (see article). Arc on scheme, it seems, does. I will have to figure that out.

Thursday, May 8, 2008

Continuations: Arc vs Java

What's a continuation? Despite much googling on the topic, continuations remained opaque to me until I stumbled upon Sam Ruby's amazingly excellent Continuations for Curmudgeons. Please read that article instead of this, it will be much more useful.



Here's a really fast description of continuations in java terms. Imagine in java we had first-class functions (like javascript). In other words, we could say



Function p = myObject.toString // no parens!

// later ...
p(); // calls myObject.toString()


Now, let's jump a little further: imagine "return" is not just a keyword, but it is a special built-in function that is magically available in the body of every method, just like "this" and "super" are. And we have first-class functions. So, we can write





public String foo(int n) {
this.cc = return;
return("foo " + n); // "return" is a function
}


public void bar() {
System.out.println("I have received " + foo(10));
System.out.println("end of bar");
}

...

bar(); // displays "I have received foo 10", then "end of bar"
cc("haha!") // displays "I have received haha!", then "end of bar", forever


With the call to cc you are re-entering bar at the instruction immediately after the call to foo(), and not only that, when bar returns, it will return to whoever its caller was when it was invoked the first time, and so on all the way up the stack. Not only that, the stack prior to the call to cc is completely discarded and gc'd (unless, of course, there's a stored continuation somewhere pointing to that stack).



Being a java dude, this was a shocking revelation for me. Worse than goto, a total abomination, a devilish contortion of everything that is right and proper in design.



But I was undeterred. Sam Ruby's article gave me what I needed to implement continuations in rainbow, although it meant re-writing everything. Because java doesn't allow me just store "return", I need to explicitly handle a stack in java code, so that I can "return" to it at any point, just in case someone decides to store it in a continuation. If the stack is stored in this way, it is not subject to garbage collection, so the entire stack - local variables, parameters, the stacktrace - is preserved.



So when rainbow invokes an arc function, it creates a new continuation for the invoked function, and passes the new continuation the current continuation, so the new continuation knows where to return to when it's done.



This results in java code written in Continuation-Passing style, or CPS. CPS and continuations are separate concepts, but feel the same. Here is a quick CPS tutorial in Java. Normally we write this:




int calculateSomething(int p, int q) {
return p * q;
}

...

int x = calculateSomething(p, q);
System.out.println("result is " + x);


Trivial, no? Here's the same thing in CPS:




void calculateSomething(int p, int q, Continuation c) {
c.receive(p * q);
}

...

Continuation c = new Continuation() {
public void receive(Object o) {
System.out.println("result is " + o);
}
}

calculateSomething(p, q, c);


The idea is that your methods must never return a value, they should instead pass the intended return value to a "Continuation" parameter destined to receive it. The responsibility of the Continuation object is to continue whatever was in progress. Here, the Continuation object calls System.out.println, something that was achieved in the second line of the original version. So why would you want to write in CPS? In Java: you wouldn't, it's a nightmare, unless you happen to be writing a Lisp interpreter.



One difference between CPS and true continuations is that a true continuation never returns: the caller's stack is discarded, and the stack in scope for the called continuation is resurrected. CPS in java is a total hack. In the example above, any code following "c.receive(p * q)" is meaningless. The call to receive must be the last statement executed in the method - it's replacing "return". Using CPS in java, it's easy either to forget to invoke receive, or to forget to return immediately afterwards.



A slight problem for rainbow is that CPS functions never return: the last instruction is a call to the previous continuation. So, normally, the java stack grows and grows. To avoid this, rainbow executes arc in a while(true) loop. Each function invocation simply stores a reference to the function to invoke along with a reference to the caller and the lexical context - and then returns. The java stack unwinds, the loop loops, and the waiting function gets executed.

Thursday, May 1, 2008

Introducing Rainbow

Rainbow is an almost full implementation in java of Paul Graham's new lisp, arc. The arc compiler (ac.scm) is written in a little over 1000 lines of scheme - so I figured I could do it in a little over 10000 lines of java. In fact, I got in at about 5000.



Rainbow interprets its internal representation of arc code, and does not compile to bytecode. Hence it is not quite as fast as it might be. Rainbow sports continuations and tail-call-optimisation (which I believe Clojure, the alternative jvm lisp, lacks). These two features, being completely alien to the jvm, perhaps mean that compiling entirely to bytecode will remain out of reach for the moment.



Why java? It's a language I know reasonably well, having used it almost exclusively over the last ten years. Java gives us garbage collection and well-defined thread handling for free, as well as access to some great libraries. On the other hand, java is probably the most politically incorrect language for an arc interpreter: it might cause arc to curl up like a bimetallic strip, a bit like writing a ruby compiler in cobol. What a concept.



Actually, here's the truth. In Arc at 3 weeks (2001), Paul Graham declares that arc is for good programmers, unlike java, which is for average programmers. I admit it, I am driven by pure hubris.



A goal of arc is to define as much as possible of the language in the language itself. So some keywords that we would expect to be a fundamental part of the language (for example, def and mac, for defining top-level functions and macros) are defined in arc, and not provided by the compiler at all. Rainbow respects this goal and replaces only the scheme compiler while leaving the rest of the language intact.



This blog will be the story of my travels in arc-land, of the challenges faced both in building rainbow and in using arc, from the perspective of someone who has been thinking in java for far too long now ...



Here are the best places to go for more arc info:




  • Arc - the official site, plenty of background. Be sure to read the tutorial.

  • the arc forum

  • arcfn.com - solid and thorough documentation of the language by Ken Shirriff, along with his own experiences integrating arc and opengl

  • Anarki, the github "unofficial arc" repository, based on Paul Graham's release, with community-contributed features, fixes, and libraries.

  • Rainbow is also on github