Thursday, May 29, 2008

arc functions

So the long and the short of it is, we get arc to evaluate expressions. Atoms mostly evaluate to themselves: strings, characters, and numbers.

arc> "foo"
"foo"
arc> 12
12

Symbols are also atoms, but have the special property that they may be defined to have some other value. Quoting a symbol means it doesn't get evaluated.

arc> 'bar
bar
arc> (set bar "toto")
"toto"
arc> bar
"toto"

Lists are another story: when arc evaluates a list, it assumes the first item of the list is a function, and the remainder are arguments.

arc> (+ 1 2)
3
arc> (expt 9 2)
81

The first item is not necessarily a function: it might also be a reference to a macro, or in anarki, an object for whose type a function has been prepared via defcall. For another day. The first place in a list is usually referred to as "functional position". So, the procedure for evaluating (a b c d), is to evaluate each of a, b, c, and d (order is unspecified). a should evaluate to a function, which is then called with the values of b, c and d as arguments.

But, the object in functional position doesn't have to be a symbol pointing to a function. It could also be an inline, anonymous function.

arc> ((fn (a b) (+ a b)) 23 67)
90

Disadvantage: it's dead ugly, and totally unreadable. Nest a few of these and you have parenthesis soup. It turns out this is one of the fundamental organising principles of arc code. So why bother?

  • group expressions

  • declare local variables

  • iteration

  • all sorts of wonderful things

- all this in a neat and concise way, without namespace clutter or parameter hell. Except, we need macros to make it happen.

Macros: a future post.

Thursday, May 22, 2008

Lexical Scope and Closures

Lexical scope is not a radical departure from java: the concept already exists. When we create an inner instance class in java, the inner class has access to all the instance variables of all enclosing classes, and even to local variables if it's an anonymous inner class. Similarly, in arc, a locally-defined function has access to all the variables of all enclosing functions. This set of variables and their values is called the closure of the function.




(def powerfn (y) (fn (x) (expt x y)))

(set square (powerfn 2)) ; --> square is (fn (x) (expt x y)) with y = 2
(set cube (powerfn 3)) ; --> cube is (fn (x) (expt x y)) with y = 3

(square 5) ; --> 25
(cube 5) ; --> 125


In this example, the variable y retains its value - 3 or 2 - long after powerfn has exited, in the closure of the function returned by powerfn (the function that is assigned to square and cube)



When rainbow finds a locally-defined function in the course of evaluation, it creates a Closure object which encapsulates the function in question, and its closure. The Closure thus created can subsequently be stored in a variable, and invoked anywhere at any time. The implementation isn't optimal however: it keeps the entire closure, rather than just the closed variables that are actually used by the function in question.

Thursday, May 15, 2008

Tail-Call Optimisation

Tail-Call Optimisation lets you write




public void forever(long n) {
System.out.println("looking for the last number: " + n)
if (isTheLastNumber(n)) {
return;
}
forever(n + 1);
}


- and never get a StackOverflowError. Yes. Never. When a recursive call to forever exits, there is nothing left for the parent invocation of forever to do, so its stack frame can be discarded, and if it is, the stack simply doesn't grow.



Here's an arc example:




(def a ()
(b)
(e))

(def b ()
(c)
(d))

(def e ()
(f)
(g))

(def g ()
(h)
(i)
(j))


Remeber, the return value of a function is the value of the last expression of the function. So the return value of (b) is (d).



After the call from b to (c) returns, execution continues with the call from b to (d). However, when (d) returns, b has nothing more to do, so it can return directly to a.



So, when we call c, we tell it to return to b at Instruction 2. However, when we call d, instead of telling it to return to b, we tell it to return to a. The stack frame representing b at this point is no longer referenced: the continuation for d is pointing back to a, bypassing b altogether.



Ultimately, by the time we get to the call to j, we will tell j to return to whoever a's caller was, because neither a, nor e, nor g, make use of the value of (j). Presumably, a's caller does.



I hope this is making sense.



Here is the implementation in rainbow. Inside the current continuation, the variable f is the function we're evaluating, and f.nth(i) gives us the nth expression of that function. To interpret the nth expression, rainbow does (approximately) this:




ArcObject expression = f.nth(index++);
Interpreter.interpret(f.last(index) ? caller : this, expression);


So if I'm on the last expression, I tell the interpreter to return to my caller instead of to me once it's finished. Mission accomplished.



See wikipedia's tail-recursion article for more information. btw, rainbow doesn't do TCO modulo cons (see article). Arc on scheme, it seems, does. I will have to figure that out.

Thursday, May 8, 2008

Continuations: Arc vs Java

What's a continuation? Despite much googling on the topic, continuations remained opaque to me until I stumbled upon Sam Ruby's amazingly excellent Continuations for Curmudgeons. Please read that article instead of this, it will be much more useful.



Here's a really fast description of continuations in java terms. Imagine in java we had first-class functions (like javascript). In other words, we could say



Function p = myObject.toString // no parens!

// later ...
p(); // calls myObject.toString()


Now, let's jump a little further: imagine "return" is not just a keyword, but it is a special built-in function that is magically available in the body of every method, just like "this" and "super" are. And we have first-class functions. So, we can write





public String foo(int n) {
this.cc = return;
return("foo " + n); // "return" is a function
}


public void bar() {
System.out.println("I have received " + foo(10));
System.out.println("end of bar");
}

...

bar(); // displays "I have received foo 10", then "end of bar"
cc("haha!") // displays "I have received haha!", then "end of bar", forever


With the call to cc you are re-entering bar at the instruction immediately after the call to foo(), and not only that, when bar returns, it will return to whoever its caller was when it was invoked the first time, and so on all the way up the stack. Not only that, the stack prior to the call to cc is completely discarded and gc'd (unless, of course, there's a stored continuation somewhere pointing to that stack).



Being a java dude, this was a shocking revelation for me. Worse than goto, a total abomination, a devilish contortion of everything that is right and proper in design.



But I was undeterred. Sam Ruby's article gave me what I needed to implement continuations in rainbow, although it meant re-writing everything. Because java doesn't allow me just store "return", I need to explicitly handle a stack in java code, so that I can "return" to it at any point, just in case someone decides to store it in a continuation. If the stack is stored in this way, it is not subject to garbage collection, so the entire stack - local variables, parameters, the stacktrace - is preserved.



So when rainbow invokes an arc function, it creates a new continuation for the invoked function, and passes the new continuation the current continuation, so the new continuation knows where to return to when it's done.



This results in java code written in Continuation-Passing style, or CPS. CPS and continuations are separate concepts, but feel the same. Here is a quick CPS tutorial in Java. Normally we write this:




int calculateSomething(int p, int q) {
return p * q;
}

...

int x = calculateSomething(p, q);
System.out.println("result is " + x);


Trivial, no? Here's the same thing in CPS:




void calculateSomething(int p, int q, Continuation c) {
c.receive(p * q);
}

...

Continuation c = new Continuation() {
public void receive(Object o) {
System.out.println("result is " + o);
}
}

calculateSomething(p, q, c);


The idea is that your methods must never return a value, they should instead pass the intended return value to a "Continuation" parameter destined to receive it. The responsibility of the Continuation object is to continue whatever was in progress. Here, the Continuation object calls System.out.println, something that was achieved in the second line of the original version. So why would you want to write in CPS? In Java: you wouldn't, it's a nightmare, unless you happen to be writing a Lisp interpreter.



One difference between CPS and true continuations is that a true continuation never returns: the caller's stack is discarded, and the stack in scope for the called continuation is resurrected. CPS in java is a total hack. In the example above, any code following "c.receive(p * q)" is meaningless. The call to receive must be the last statement executed in the method - it's replacing "return". Using CPS in java, it's easy either to forget to invoke receive, or to forget to return immediately afterwards.



A slight problem for rainbow is that CPS functions never return: the last instruction is a call to the previous continuation. So, normally, the java stack grows and grows. To avoid this, rainbow executes arc in a while(true) loop. Each function invocation simply stores a reference to the function to invoke along with a reference to the caller and the lexical context - and then returns. The java stack unwinds, the loop loops, and the waiting function gets executed.

Thursday, May 1, 2008

Introducing Rainbow

Rainbow is an almost full implementation in java of Paul Graham's new lisp, arc. The arc compiler (ac.scm) is written in a little over 1000 lines of scheme - so I figured I could do it in a little over 10000 lines of java. In fact, I got in at about 5000.



Rainbow interprets its internal representation of arc code, and does not compile to bytecode. Hence it is not quite as fast as it might be. Rainbow sports continuations and tail-call-optimisation (which I believe Clojure, the alternative jvm lisp, lacks). These two features, being completely alien to the jvm, perhaps mean that compiling entirely to bytecode will remain out of reach for the moment.



Why java? It's a language I know reasonably well, having used it almost exclusively over the last ten years. Java gives us garbage collection and well-defined thread handling for free, as well as access to some great libraries. On the other hand, java is probably the most politically incorrect language for an arc interpreter: it might cause arc to curl up like a bimetallic strip, a bit like writing a ruby compiler in cobol. What a concept.



Actually, here's the truth. In Arc at 3 weeks (2001), Paul Graham declares that arc is for good programmers, unlike java, which is for average programmers. I admit it, I am driven by pure hubris.



A goal of arc is to define as much as possible of the language in the language itself. So some keywords that we would expect to be a fundamental part of the language (for example, def and mac, for defining top-level functions and macros) are defined in arc, and not provided by the compiler at all. Rainbow respects this goal and replaces only the scheme compiler while leaving the rest of the language intact.



This blog will be the story of my travels in arc-land, of the challenges faced both in building rainbow and in using arc, from the perspective of someone who has been thinking in java for far too long now ...



Here are the best places to go for more arc info:




  • Arc - the official site, plenty of background. Be sure to read the tutorial.

  • the arc forum

  • arcfn.com - solid and thorough documentation of the language by Ken Shirriff, along with his own experiences integrating arc and opengl

  • Anarki, the github "unofficial arc" repository, based on Paul Graham's release, with community-contributed features, fixes, and libraries.

  • Rainbow is also on github