The greatest contribution someone could make to the programming language and compiler community is not to make a great programming language, but to make the tools necessary for making new programming languages cheap — so cheap, that programming languages and the tools that allow their efficient use (e.g. compilers, debuggers, code-editors, etc.) become abundant.
And once they become abundant, people will start to see how interfaces are everything, how everything is a language for a compiler, and how there's no real answer to code's worst enemy — size — other than abstracting away parts of it and then specifying which parts you need in a particular situation via a language.1
What tools would make implementing new programming languages cheap? Let's make a list.2
- Parsing and Printing (Bi-directional Mappings between Text and Abstract Syntax)
- Tree/Graph Editor
- Inference Engine (Theorem Prover)
- Data-flow Analysis
- Incremental Transformation Engine (Rule-based Transformation, Incremental Compilation, Execution Stepper)
- Dependency Analysis
- Unified Runtime System (Memory Allocation, Garbage Collection, Runtime Safety Verification)
Parsing and PrintingThis may be a first-step before a true code-editor is created, but I suspect that it will still be desirable in some cases to stringize code. In addition, I imagine parsing code, perhaps not entire programs but small code-blocks, will still be useful in a code editor.
What is completely obvious to a compiler-writer (in fact, I knew of someone in college whose side project was exactly this) yet others seem completely oblivious to, is that you can make "skins" for your code to format it differently in text, but have it parse down to the same underlying representation (i.e. abstract syntax). This way, Joe can use curly braces to specify blocks while I can use whitespace indentation, and it will represent the same code. The same for every other parsing variation, small or large.
What's cool about this is that even though Joe and I are typing in code differently, I can view Joe's code in my format and vice versa. So no more conforming to (practically uncheckable) coding standards enforced by your tyrannical employer.
I imagine specifying these via bi-directional mappings between strings and abstract syntax data-structures. In the past, I've always had to write one piece of code to parse and another piece of code to print, and that's just silly. Of course, there are corner-cases which are specific to one as opposed to the other, and I imagine some kind of declarative DSL for specifying these.
Tree/Graph EditorCode at its core is a tree. Once it is parsed, it is represented internally as an abstract-syntax tree, or absyn tree. If you consider the semantics of the language though, variable references for example, refer back to nodes closer to the root of the tree. So in this sense, it is a graph.
The core of most IDEs today is a text-editor. This is silly though since the programmer is actually editing code, which is this tree or graph data-structure. In particular, when you edit the name of a variable reference, the editor should distinguish between changing what that node refers to and re-naming the node, as overwriting a variable in text changes what the node refers to; to re-name, you have to manually replace all references to it, including taking into account scoping rules of the language. The editor should understand when you're trying to wrap an expression in another expression, instead of relying on you to go to the beginning, add an open-paren, go to the end, add a close-paren.
Sure, it's a matter of convenience, but it's also a matter of error-prone-ness. I can't count the number of times I've had bugs due to copying and pasting textual code. If the editor I was using was aware that the structure I was copying was a graph, it would indicate to me which nodes the pasted code were referring to, allowing me to see my bug immediately.
The editor must be aware of the semantics of the language you are editing and give continuous feedback of changes made. If it weren't aware of the semantics of the language, how would it give you feedback on what you are editing that is specific to the language you are using. Thus, your code editor must be tied very tightly with your compiler. However, a general tree-editor would be a huge step in the right direction.
Inference EngineAn inference engine is critical for any code-level automation. For example, type checking, type inference, and semantic-preserving optimization. This is no simple feat to create, as it is, at its core, a theorem prover.
Instead of re-writing a theorem prover for every compiler, a compiler writer should only need to specify the axioms and rules of inference for his language to the industrial-strength theorem-prover. Simple things like constant folding and more complicated things like induction-variable elimination can fall out of this.
Data-flow AnalysisOf course, some standard optimizations like dead-code elimination (if your language is eager) and semantic checks like whether variables have been initialized (if your language allows such things) rely on data-flow analysis. Some compilers even use this to optimize memory allocation, as in whether to allocate something on the stack or the heap, by determining whether a reference to an object may escape a local block.
Again, this is another instance of something that can be done once generally, but it should also be tied in with the incremental transformation engine to allow for only re-compiling or re-executing pieces of code that have changed. This could also be extremely useful in a static sense where the editor displays to the programmer what pieces of code are affected by the flow of changes made (since the last save or last revision for example).
Incremental Transformation EngineThis is the meat of the compiler: the transformation engine. It transforms source code to object code. Or in the case of an interpreter, source code to immediately-executed instructions. It must be incremental in the global sense in that it must handle incremental building for a more responsive development environment. But it must also be incremental in the local sense in that it must allow for the programmer to pause, step, observe, or otherwise control the transformation for debugging purposes.
Home-grown interpreters usually start out all-or-nothing; they either transform or they don't. The reason is simply because the most straight-forward and clean implementation is a recursive traversal of an absyn tree. This leaves no room for stepping through code and displaying the current state to the programmer. A debugger is an after-thought, and rightly so, as its cost outweighs its benefit when a language implementor is creating a language for himself or others who have access to the language the compiler is written in.
I imagine writing pattern-matching rules for the transformation engine similar to the way someone might write a recursive traversal of an absyn tree in ML, however, the transformation engine will automatically handle incremental stepping and observing of the transformation process for interaction with the programmer when debugging. In addition, I imagine many additions to this language compared to ML which allow short-hand for specifying common cases of transformation, like only specifying the differences in base cases and the engine recursively applying the transformations to all other nodes.
Dependency AnalysisBased on dependency analysis, the transformation engine can automatically determine which code to (re-)compile and in which order. This should include external dependencies as well.
In many projects I've worked on in the past, circular dependencies have prevented standard tools from working as expected. As a work-around, we were forced to transform our module layouts and handle the dependencies manually. Although circular dependencies in general can lead to confusing module-interfaces, there are some cases when it just makes sense, and the transformation engine should handle circular dependencies by simply using a two-pass algorithm.
Unified Runtime SystemThis has got to be the most depressing topic for all language implementors. If you implement your own runtime system for memory allocation and garbage collection, what hope do you ever have of your language interacting with the rest of the world — something that is necessary both for production apps and popular adoption. If you use an existing runtime system, you are stuck on its platform and forced to use its (often unsuitable) virtual machine.
Parrot, looking good. Perhaps this is on its way already.
As for the runtime safety verification, I would love to see something that used proof-carrying code. For those who don't know about this, it is a way to guarantee runtime safety in the way that traditional sandbox-like VMs do at runtime, but statically, so that runtime overhead is eliminated and native machine-code can be run directly with all the same safety guarantees.3
All of these tools are non-language-specific, and they would help make creating new programming languages economically viable for software developers and startups in particular. In the end, this would advance the field of programming languages immensely, more than the development of any single language could.
1. I mean "language" in the most general sense, as it's sometimes suitable to abstract something away into a library and have your language be function calls into that library. Other times it makes sense to have a full-blown Turing-complete language. Anyway, this is why PL design is so important — it's really being done everywhere even though most people don't realize it.
2. What's missing from this list?
3. Peter Lee, a professor I worked for in college, was one of the inventors of proof-carrying code, and it turns out something so amazingly useful is practically a special case of type checking. However, this also requires proof-generation, another use of a good theorem prover. In the sci-fi stories that talk about computers programming themselves — they're talking about theorem provers. ...It's sad for us CSers. Most people are simply unaware of the amazing things currently in development.