Monday, April 14, 2008

The Phase Concept

Anyone who's been following my blog for a while may have seen a pattern by now. Everything I've written about programming languages has a theme, which when extrapolated, has one logical conclusion: to create a compiler for a programming language that is a good tool for creating other programming languages (possibly mini languages otherwise known as APIs or DSLs) with a GUI editor that is aware of the semantics of the language and whose target language is well-established with many existing tools.

The compiler project I mentioned last post is a start of that. However, it is by no means the final product. First of all, I conjecture that an s-expression-based source language will lend itself as a good target language for a graphical code-editor built later.

Secondly, the idea of having PHP as the compiler's target language was based on the desire to take advantage of the hordes of PHP code already out there for creating web apps. However, after creating an initial prototype, it became obvious that PHP's lack of support for closures is a huge obstacle in creating the compiler whose source language has closures. I can't imagine not having closures, so PHP is out. (It's not that it can't be done, but it would be significantly more work to compile away the closures.) This is good though, because it forces me to re-write (i.e. re-design) the compiler.

I also decided against compiling to Python. Even though I have good feelings towards it, I can't justify using it when it restricts closures to be one-liners in an otherwise imperative language. I was also considering Common Lisp as a target language. The thing is, using it as a target language leaves this new language with all the same problems that Common Lisp has, and so in a way that would defeat the purpose of building on top of something supported by armies of coders. Put another way, CL's armies are significantly smaller than the armies of other languages.

As much as I don't want to admit this, Ruby is starting to look like the best option for a target language.

So for those of you wondering if I'm going to release my little prototype, I see no reason to. It was written in Haskell as a proof of concept. The s-expression parser was taken from the Lisp interpreter I wrote, and I simply added the translation to PHP.

I have concerns though about certain features like eval. My first inclination was to include it, as I plan on having something like macros à la Lisp. That could slow down the development of a prototype, and thus feedback, so I may cut it from the first version. Including eval creates a bootstrapping problem. It requires me to either write the compiler in the target language or include enough language primitives to implement eval in the source language itself and re-write eval in my new language. This is a sad cut, but it's necessary to get a feel for the language quickly.

So what is this "language" I keep referring to? What's special about it? What will its purpose be? It's just an idea I've been toying with, and this prototyping is meant to try to figure out if it's a good idea or not.

Every language lends itself to writing code in a certain way. Java, for example, lends itself to writing code in an object-oriented way. You could, however, write Java code that looks more like garbage-collected C code with classes used only as namespaces. Or you could write functional code in Java, passing around "functors" built out of anonymous classes. But the reason people tend towards writing object-oriented code in Java is because Java lends itself to an OO design. It makes writing OO code cheap — so cheap that it changes the way you think about algorithms so that they fit an OO model.

But me, I already think of everything as a compiler. I see every program as a compilation from inputs to outputs. A giant function if you will. Of course, when a program is big, you break it up into multiple functions, each with its own inputs and outputs. On a larger scale, you break up groups of functions among modules, where each module's interface defines a mini DSL, and each module's implementation is the compiler for it.

In this way, every program is a composition of mini compilers between mini languages. Oftentimes data in a program will pass through many intermediate stages as it flows from input to output through various transformations. In the same way that C++ code gets compiled to C code, then to object code, and then finally to machine code, each stage that data flows through is a compilation phase.

With a C++ compiler, the data happens to be C++ source code which gets translated into machine code. However, a clock program is a compiler from the OS's API for retrieving the system's time to a graphical readout of the time. A database engine is a compiler from SQL statements (select, update, delete, etc.) to result sets. (Order of execution is significant, as updates affect the results of compiling select statements in the future.)

A text editor is an advanced compiler with many phases of compilation. Ignoring the transformation (or compilation) of keystrokes to key codes at the hardware and OS levels, text editors transform key presses (and perhaps mouse input) into formatted text, formatted text into the graphical layout of the formatted text, formatted text into linear byte-streams for saving, formatted text into postscript or something suitable for printing.

I already see everything as a compiler, so why not have a language that lends itself to writing programs in this paradigm. A language that makes it cheap to express computations as the composition of multiple phases of translations from one language to another.

It's all about dataflow and how that data changes form as it passes from one phase to the next. So for now, "phase" is its code name.

1 comment:

_ said...

My god man, read Lisp in small pieces.