Sunday, December 30, 2007

The Greatest Contribution to Programming Languages

Thinking about tools, it occurred to me that the greatest contribution to programming languages is not any single language. No, not Lisp, C, ML, Haskell, Qi, or any of the more obscure yet equally great languages.

The greatest contribution someone could make to the programming language and compiler community   is not to make a great programming language, but to make the tools necessary for making new programming languages cheap — so cheap, that programming languages and the tools that allow their efficient use (e.g. compilers, debuggers, code-editors, etc.) become abundant.

And once they become abundant, people will start to see how interfaces are everything, how everything is a language for a compiler, and how there's no real answer to code's worst enemy — size — other than abstracting away parts of it and then specifying which parts you need in a particular situation via a language.1

What tools would make implementing new programming languages cheap? Let's make a list.2
  • Parsing and Printing (Bi-directional Mappings between Text and Abstract Syntax)
  • Tree/Graph Editor
  • Inference Engine (Theorem Prover)
  • Data-flow Analysis
  • Incremental Transformation Engine (Rule-based Transformation, Incremental Compilation, Execution Stepper)
  • Dependency Analysis
  • Unified Runtime System (Memory Allocation, Garbage Collection, Runtime Safety Verification)
Some more details...

Parsing and Printing

This may be a first-step before a true code-editor is created, but I suspect that it will still be desirable in some cases to stringize code. In addition, I imagine parsing code, perhaps not entire programs but small code-blocks, will still be useful in a code editor.

What is completely obvious to a compiler-writer (in fact, I knew of someone in college whose side project was exactly this) yet others seem completely oblivious to, is that you can make "skins" for your code to format it differently in text, but have it parse down to the same underlying representation (i.e. abstract syntax). This way, Joe can use curly braces to specify blocks while I can use whitespace indentation, and it will represent the same code. The same for every other parsing variation, small or large.

What's cool about this is that even though Joe and I are typing in code differently, I can view Joe's code in my format and vice versa. So no more conforming to (practically uncheckable) coding standards enforced by your tyrannical employer.

I imagine specifying these via bi-directional mappings between strings and abstract syntax data-structures. In the past, I've always had to write one piece of code to parse and another piece of code to print, and that's just silly. Of course, there are corner-cases which are specific to one as opposed to the other, and I imagine some kind of declarative DSL for specifying these.

Tree/Graph Editor

Code at its core is a tree. Once it is parsed, it is represented internally as an abstract-syntax tree, or absyn tree. If you consider the semantics of the language though, variable references for example, refer back to nodes closer to the root of the tree. So in this sense, it is a graph.

The core of most IDEs today is a text-editor. This is silly though since the programmer is actually editing code, which is this tree or graph data-structure. In particular, when you edit the name of a variable reference, the editor should distinguish between changing what that node refers to and re-naming the node, as overwriting a variable in text changes what the node refers to; to re-name, you have to manually replace all references to it, including taking into account scoping rules of the language. The editor should understand when you're trying to wrap an expression in another expression, instead of relying on you to go to the beginning, add an open-paren, go to the end, add a close-paren.

Sure, it's a matter of convenience, but it's also a matter of error-prone-ness. I can't count the number of times I've had bugs due to copying and pasting textual code. If the editor I was using was aware that the structure I was copying was a graph, it would indicate to me which nodes the pasted code were referring to, allowing me to see my bug immediately.

The editor must be aware of the semantics of the language you are editing and give continuous feedback of changes made. If it weren't aware of the semantics of the language, how would it give you feedback on what you are editing that is specific to the language you are using. Thus, your code editor must be tied very tightly with your compiler. However, a general tree-editor would be a huge step in the right direction.

Inference Engine

An inference engine is critical for any code-level automation. For example, type checking, type inference, and semantic-preserving optimization. This is no simple feat to create, as it is, at its core, a theorem prover.

Instead of re-writing a theorem prover for every compiler, a compiler writer should only need to specify the axioms and rules of inference for his language to the industrial-strength theorem-prover. Simple things like constant folding and more complicated things like induction-variable elimination can fall out of this.

Data-flow Analysis

Of course, some standard optimizations like dead-code elimination (if your language is eager) and semantic checks like whether variables have been initialized (if your language allows such things) rely on data-flow analysis. Some compilers even use this to optimize memory allocation, as in whether to allocate something on the stack or the heap, by determining whether a reference to an object may escape a local block.

Again, this is another instance of something that can be done once generally, but it should also be tied in with the incremental transformation engine to allow for only re-compiling or re-executing pieces of code that have changed. This could also be extremely useful in a static sense where the editor displays to the programmer what pieces of code are affected by the flow of changes made (since the last save or last revision for example).

Incremental Transformation Engine

This is the meat of the compiler: the transformation engine. It transforms source code to object code. Or in the case of an interpreter, source code to immediately-executed instructions. It must be incremental in the global sense in that it must handle incremental building for a more responsive development environment. But it must also be incremental in the local sense in that it must allow for the programmer to pause, step, observe, or otherwise control the transformation for debugging purposes.

Home-grown interpreters usually start out all-or-nothing; they either transform or they don't. The reason is simply because the most straight-forward and clean implementation is a recursive traversal of an absyn tree. This leaves no room for stepping through code and displaying the current state to the programmer. A debugger is an after-thought, and rightly so, as its cost outweighs its benefit when a language implementor is creating a language for himself or others who have access to the language the compiler is written in.

I imagine writing pattern-matching rules for the transformation engine similar to the way someone might write a recursive traversal of an absyn tree in ML, however, the transformation engine will automatically handle incremental stepping and observing of the transformation process for interaction with the programmer when debugging. In addition, I imagine many additions to this language compared to ML which allow short-hand for specifying common cases of transformation, like only specifying the differences in base cases and the engine recursively applying the transformations to all other nodes.

Dependency Analysis

Based on dependency analysis, the transformation engine can automatically determine which code to (re-)compile and in which order. This should include external dependencies as well.

In many projects I've worked on in the past, circular dependencies have prevented standard tools from working as expected. As a work-around, we were forced to transform our module layouts and handle the dependencies manually. Although circular dependencies in general can lead to confusing module-interfaces, there are some cases when it just makes sense, and the transformation engine should handle circular dependencies by simply using a two-pass algorithm.

Unified Runtime System

This has got to be the most depressing topic for all language implementors. If you implement your own runtime system for memory allocation and garbage collection, what hope do you ever have of your language interacting with the rest of the world — something that is necessary both for production apps and popular adoption. If you use an existing runtime system, you are stuck on its platform and forced to use its (often unsuitable) virtual machine.

Parrot, looking good. Perhaps this is on its way already.

As for the runtime safety verification, I would love to see something that used proof-carrying code. For those who don't know about this, it is a way to guarantee runtime safety in the way that traditional sandbox-like VMs do at runtime, but statically, so that runtime overhead is eliminated and native machine-code can be run directly with all the same safety guarantees.3

All of these tools are non-language-specific, and they would help make creating new programming languages economically viable for software developers and startups in particular. In the end, this would advance the field of programming languages immensely, more than the development of any single language could.
1. I mean "language" in the most general sense, as it's sometimes suitable to abstract something away into a library and have your language be function calls into that library. Other times it makes sense to have a full-blown Turing-complete language. Anyway, this is why PL design is so important — it's really being done everywhere even though most people don't realize it.
2. What's missing from this list?
3. Peter Lee, a professor I worked for in college, was one of the inventors of proof-carrying code, and it turns out something so amazingly useful is practically a special case of type checking. However, this also requires proof-generation, another use of a good theorem prover. In the sci-fi stories that talk about computers programming themselves — they're talking about theorem provers. ...It's sad for us CSers. Most people are simply unaware of the amazing things currently in development.

Friday, December 21, 2007

Design — A Life Philosophy

Every problem is a design problem full of design choices. Even the choice to take a hands-off approach is a design choice. Every decision you make in your life — what to eat for lunch, what to do as a profession, who to hang out with on Saturday night — is a design decision. Each and every one of us does our best to design our life. Why would you ever choose something unless you were convinced that it would further you towards your idea of a successful life, for whatever your idea of a successful life is.

Joel gives a nice description of what design is in his 3-parter. Design is all about making trade-offs — as is life. Everything has an opportunity cost.

Sunday, December 16, 2007

On Practicality and Tool-Making

A comment on my last post on design and programming languages spawned the following.

Q: Are you implementing some of your ideas?

A: It's always in the back of my mind to design and implement my dream programming language. This blog is in part my outlet for ideas that come streaming into my mind about programming languages. In fact, it's my only outlet. The more specialized you get, the fewer and fewer people you can find who consciously value your field; most people don't even know it exists. Consequently, I would just love to have someone to bounce ideas between who actually has a clue of what I'm talking about.

That said, I'm not actively working on something at the moment, and I think that warrants an explanation.

Back when I started programming, I primarily liked to make things that were useful on a day-to-day basis. In making such things, I started to see patterns and got the idea of making tools to make creating them easier and quicker. That eventually led me to programming languages, the medium of expression of all software. As a result, I spent practically my entire undergraduate career studying type-theory and compilers.

To this day, I am more convinced than ever that PL design is one of the most important things to {software as we know it}.

However, given all that, tool-making can be a huge side-track. Because like all things, there is a pattern of pattern-discovery and tool-making.
  1. You use the tools you have to solve a problem.
  2. You discover patterns in the use of those tools.
  3. You make new tools that eliminate those patterns.
  4. You return to step 1 with your new tools.
But if you're like me and see patterns everywhere, you could end up spending most of your time in step 3, not step 1. Especially when you see patterns in the tool-making process and start to recurse on sub-goals. And if your ultimate goal is to solve your original problem, there's something wrong.

The bottom line is that hard work — sometimes an obscene amount of it — pays off if it's focused on your goal. And the whole tool-making branch only makes sense towards your bottom-line if {the amount of work spent making your tool} plus {the amount of work it takes to accomplish your goal with the tool} is less than {the amount of work it would take without the tool}. This simple accounting formula has told me that making compilers and new programming languages is just not worth it given my goals. The cost of making a compiler and all the tools necessary to be productive with a new language is just so high that it dominates the inequality.

This can be frustrating, especially for someone who wants to create something beautiful.

But it is a matter of practicality.

Now, this may seem like an elaborate excuse to not work on something hard and important. But to me, it is the reason why I should work on other hard and important things.

I'm not trying to discourage anyone from working on compilers or designing new programming languages. On the contrary. If your goal is to make the best programming language in the history of programming languages — a so-called hundred-year language — then by all means, work on that! Seriously, please do; we are all in need.

All I'm really saying is that everything has an opportunity cost, and you should evaluate everything in terms of your goal, whatever that may be. Sometimes that means putting aside your own desires, like the desire to make a tool. But in the end, it's worth it.

Sunday, December 9, 2007

1, 2, n

It's the 1, 2, n rule in software development.

You do something once. You do something twice. And the next time you do it, you generalize it for all times henceforth.

Waiting longer to generalize is painful at best. In the worst case, it leads to maintenance nightmares as when the same thing is done 14 times in 14 different places, each time slightly different than the rest.

Generalizing too soon however leads to other problems. Everyone's done this at one point in their coding career or another. Hey, this is a great idea! Why hasn't someone else done it before? I'll code it completely general the first time around so that it'll be done right, from the start. And then a few weeks of coding go by, you start to actually use it, and you find out the hard way that either it was a really bad idea or the API is so terrible that you have to re-write it. Because basically, you didn't iterate your design.

Code is all about design. And good design is re-design.

This is a big part of the reason why programming languages are so far behind. The amount of work that it currently takes to create (a compiler for) a programming language is so high, that the time it takes for a language designer to get feedback on his design is years. But by that time (if he was lucky enough to actually get users), he's already invested so much into it that he is unlikely to throw it away and start from scratch — something a good designer must do. On top of that, even if he wanted to start from scratch, it would take many months to de-program his mind from thinking in the previous language. (Like the first time a Java programmer learns Haskell and tries to write a loop!)

An IDE is part of a language. Error messages are part of a language. Interacting with other systems, written in other languages, is part of a language. They're all part of the interface of the language. And the interface is everything.

It's a chicken-vs.-egg problem though. Because to get all these tools made, based on history, requires that a language be popular. But how do you get people to use a language if it doesn't have the most basic tools like a debugger?

JavaScript is doing it right now. JavaScript tools — which make using the language no longer so painful — are cropping up everywhere these days. And the way it did that was by being the only choice if developers want their code to be executed in a browser window. Most languages aren't so lucky. So what is a language designer to do?

As a designer, I can't simply make a command-line compiler for my language. I have a vision of what editing code is like. And it's not with a text-editor. The closest thing I've ever seen to a true code-editor is Eclipse for editing Java code, but Eclipse has been around long enough for us to learn a few lessons. I can already hear the Emacs junkies crying about how Emacs is the ultimate editor. But again, it's still a glorified text-editor.

A true code-editor must absolutely be aware of the semantics of the language you are editing. It is inseparable from the compiler. Because the compiler knows how to interpret your code, but when something is amiss, it must be able to feed back to the programmer.

Writing code is all about design. And a designer needs feedback — as much feedback as possible. The quicker he gets that feedback the better, because he can make any necessary changes to the design only after he's gotten feedback on his current design.

So any code-editor that doesn't give immediate feedback on parse errors and requires that you do a full build is simply crap! How can you possibly work like that?! Ditto for type errors, link errors, and test-assertion errors. People see it as obvious when the old-timers tell stories about how they used to submit compilation jobs to mainframes on punch-cards, only to find out the next day that they forgot a semicolon and had to re-submit the entire job. Yet when it comes to where we are, here and now, people are blind.

Frankly, I'm getting tired of it. I want to see test data flow through my code as I write it and the unit-test results change from red to green as I fix a bug. I want to be able to highlight two blocks of code and have my code-editor factor out a function, replacing the two blocks with applications. I want bottlenecks to be highlighted in-editor based on profiling results. And compilation should be done completely in the background so I don't have to wait after making a change to use it. And this read-eval-print-loop that's all the rage nowadays might actually be useful if I could use it in the context of a particular point of my code.

Is it really that hard? Someone please tell me I'm a visionary because I can't believe that this can't be done. Nothing a little universal substitution and computing power can't solve.

Wednesday, October 31, 2007

Quantity Implies Quality

Quantity implies quality.

Example: In recent news, a professor searched 2,985,984 possible candidates for a universal Turing machine. In an earlier era when computing power was scarce, that may have been too many to search in a practical amount of time. But this guy found the answer. (Well, maybe.)

Example: Chess-playing is currently a prediction problem, but if resources increase enough to search the entire set of possible moves (which is theoretically possible as far as I understand the rules), winning becomes a search problem — which can always be solved. Think of tic-tac-toe. Or even checkers. In some games, a player with an abundance of computational power can look ahead to see which moves will lead to a win. Or at least a tie.

...I know what you're thinking. Wait, chess-playing is still a prediction problem. As with checkers, perfect look-ahead will tell you which moves will lead to the highest number of winning game-states compared to the total number of states. But your opponent, if he's smart enough, could always choose one of the ones that isn't a win for you. So it is more about predicting which moves your opponent will take. But I think this takes chess back to its roots — the fundamentals of a game.

The point is... some problems which are fundamentally search problems get transformed into prediction problems because the search-space is too large to be searched with current resources in a practical amount of time. Thus, the potential of a final right answer is lost.

By increasing resources to an abundance and using the law of Universal Substitution, you can re-convert problems back to search problems.

Quantity really does make a difference.

Tuesday, October 23, 2007

Everything Is a Compiler

All algorithms can be classified into one of the following.
  • Search Engines
  • Prediction Engines
  • Networks
  • Operating Systems
  • Compilers
This becomes obvious when you look at what programs do from a high level: search, predict, communicate, manage resources, and translate/transform.1


Searching is when you start with a question and your goal is to determine the answer. The answer may be fuzzy, but there is a definite best answer that can be tested. In other words, given two potential answers, a search engine has the ability to test which answer is the better one. Moreover, there are a finite number of potential answers. Examples include string searching and graph shortest path algorithms.


Predicting, like searching, is when you start with a question and your goal is to find a good answer. Unlike search, there are potentially infinite possible answers and/or the best answer can not be tested explicitly. Heuristics must inevitably be used, but there is no guarantee that the chosen heuristics will lead to a good answer. Like fitness functions in a genetic algorithm (which falls under the prediction category), local maximas can be hit without indication. Current chess-playing algorithms are also prediction engines since they can not search the entire game-space of moves and must guess which move will more likely lead to a win.


Networks are made up of multiple entities that are independent in some respects, and the goal is to transfer/transport resources between those entities. In the case when the resource is information, the goal is to communicate. Any problem where entities do not have perfect knowledge of the entire system can be a networking problem. For example, a P2P search engine must use a networking algorithm or protocol to determine the existence of a target on a foreign entity, assuming the searching entity does not have an index which would amount to all-encompassing (possibly perfect) knowledge.

Operating Systems

Operating systems manage limited resources. When a resource is abundant compared to demands for it, management of the resource is not needed. When it is too rare to meet demands, management can not help. But when a resource is scarce, care must be taken to ensure that it is not abused, especially when entities with differing goals are competing for the resource. This is the goal of operating systems: to prevent abuse of scarce resources.


Compilers take an input and give an output that somehow corresponds to the input. In the case of a programming language, a compiler takes in a program in a source language and outputs a program in a target language whose semantics are the same as that of the input program. In this way, the input and output correspond. However, the correspondence need not be unique (or surjective or injective). The compiler acts as a map from one language to another. Moreover, the language need not be a programming language, but something more general like a GUI. In this context, the languages that a compiler translates between are languages in the most general sense — any symbols. If the symbols of the language represent something and have meaning, then we usually call this translation; if they don't, we call it transformation.


How is compilation different from search?

Fundamentally, both search engines and compilers transform an input into a corresponding output.2 The difference is in their approach. With search, there is a finite answer space and the answer is determined via exhaustive comparison with the input, whether that comparison is done at query-time or indexing-time. With compilation, the answer is determined via formulaic transformation of the input. There is a formula or method to get from the source to the target without ever touching all possibilities.

How Is This Helpful?

Realizing that a program you are making is implementing a certain kind of algorithm allows you to take advantage of all previous work on algorithms of that type. Currently, the way primitives and libraries are organized in most languages, they don't lend to revealing the high-level purpose of a program, and it's actually very easy to slowly go from a few parameters   to a configuration file   to a domain-specific language without even realizing it.

When things become that complex, you will wish you had designed your configuration interpreter as a compiler, which people have been making for years   and in general have well-designed methods of handling issues that come up during compilation. Likewise for searching and network protocols. And I suspect the same is true for prediction and resource management.

Algorithm Mismatches

Some algorithms pretend to be in one category when the problem they are trying to solve is fundamentally in another category. For example, web search. Certain cases aren't really search at all; they're prediction — because a person doesn't know what he's searching for until he's seen it. In other words, a person types in "Brad Pitt's birthday" as a Google Search, and there is a definite answer. But if someone searches for "cook", does he mean "cook" the verb or "cook" the proper name? If he means the name, does he mean the company, the county, or the person? (This is exactly why there are things like Clusty.) Or maybe his intent was literally to find all the things related to the word "cook", which most search engines assume. Because this is a search problem they can solve.

But the fundamental problem is ultimately prediction because even if the search weren't ambiguous, say it were about cooking the verb, what about cooking is the person looking for? Maybe the person is looking for a good recipe to impress his honey this weekend. Or maybe he doesn't know how to cook at all and wants to learn. Or maybe he's a master chef wanting to read articles about cooking.

An application developer will say that if this were the case, the person should have typed in exactly that: "articles about cooking". But my point is that even if he did type in "articles about cooking", which article is the one that will satisfy him? And this is something the person does not even know, until he's seen it. This is why the ultimate problem is prediction, not search. Search is merely an approximation. It may be an extremely useful one, but it's still an approximation. And seeing this mismatch will allow you to take your algorithm to the next level.3


Repetitiveness of Programming

Programming — as often as programmers attempt to never repeat themselves — is an extremely repetitive task. From a high-level view, all programs do these 5 things in varying combinations and details of problem domains. So the question arises... Can we simply master these 5 things, create the ultimate library for each of them, and only specify the details or parameters when writing a new program? At first, this seems like what we do already with programming languages. However, what this pattern suggests is that the primitives of the language should be Searching, Predicting, Networking, Managing Resources, and Compiling. Not looping, maintaining data structures, or other more low-level operations that are inherent in implementing the Big 5.

This is not a brand-new idea; it is just taking the next step. Already we see other languages that have just one of these primitives, and they excel beyond the rest. For example, Perl's regular expressions are a form of searching. They did it right and made it a language primitive. When people saw how powerful Perl regular expressions were, they duplicated them in practically every other language. Regular expressions though are still very specific in that they are only designed to search strings for a limited kind of patterns.

What would the interface to these 5 modules look like? I'm not sure, but the ultimate goal is that they entail a much higher-level language than mainstream programming languages today.


When you start to look at what I'm suggesting, you might recognize that this has already been thought of. It is the idea that your program is a specification for a search engine or a specification of a compiler, and your new compiler knows how to create the search engine or compiler from your specification of it. In other words, it is a search engine compiler. Or a compiler-compiler.

Years ago, much research was done on creating compiler-compilers. As a result, we have tools like Lex and Yacc. "Yacc" even stands for "Yet Another Compiler-Compiler". The research went further though, attempting to automatically generate the next phase of compilation. Other phases have been completely automated in a similar way with tools like ml-burg (or its equivalent in other languages like JBurg).

Similarly, a specification of a network protocol should entail its implementation. A policy for resource usage should entail its implementation. Likewise for the other categories.

Then the specification to combine these — yet another instance of one of the above types — yields the whole program.
1. If you haven't noticed by now, I don't usually bother with proofs. All axioms and even the rules of inference themselves are intuitive assumptions. For those who don't see that, I challenge you to try to prove induction, or better yet   deduction. (My study of epistemology taught me that intuition is the complementary tool of logic, and that the only way to know something for certain is to know the knower.) I realize this opens me up to a lot of criticism, but so be it. ...As long as we keep questioning.

Are there any categories that I missed?

I didn't include databases even though some might argue they are more fundamental since all algorithms use some kind of data structure. However, even though data drives all algorithms and the algorithms themselves are a form of data, no one would care about the data if they could get the results of the algorithms without the data. So databases are an implementation detail, and I am only concerned with the highest level in this post.
2. If you want to get really high-level, every program does this — transform input into a corresponding output. And this is what led me to the title "Everything Is a Compiler". ...Except perhaps truly intelligent machines, which some argue   are more than I/O.
3. Realizing that web searching is really prediction, you start to see that current criteria like frequency of keywords on a page or page rank make up the fitness function of a genetic algorithm. The fittest pages are then presented to the user in search results, who then clicks through and/or refines his search query. Smart search engines actually learn from this user-interaction — hence things like Google's search history and bookmarks. And they evolve their fitness function based on this. Voilà — co-evolution drives one of the most used, most influential websites of all time.

Monday, October 22, 2007

Perl Is Female

I'm learning Perl now. I've used it before but never got deep into it.

From a logic-foundations perspective of programming languages, Perl makes absolutely no sense. Its semantics have no rhyme or reason. It is completely illogical.

I searched for a Perl language spec and found this on wikipedia.
There is no written specification or standard for the Perl language, and no plans to create one for the current version of Perl. There has only ever been one implementation of the interpreter. That interpreter, together with its functional tests, stands as a de facto specification of the language.
This basically says to me that Perl was designed as much as English was designed. And indeed, Perl is compared to a natural language.

I searched further and found the Perl language reference. And the Perl syntax page describes semantics, not syntax at all. On that page, there are numerous occasions that explain a rule, explain how there are exceptions to the rule, and then warning you to "don't do that"! It points out other things you "probably shouldn't rely upon".

...But that's just it isn't it. Perl is illogical. But it is intuitive. If programming languages had a sex, Perl would be female.

Its intuitive semantics can be seen in conditionals. The following is from the Perl syntax page.
The number 0, the strings '0' and '', the empty list (), and undef are all false in a boolean context. All other values are true. Negation of a true value by ! or not returns a special false value. When evaluated as a string it is treated as '', but as a number, it is treated as 0.
What??! ...The only types that seem to exist in the core language are scalars, lists, hashes, and functions of arbitrary signature. References seem to be the only thing with an arrow kind.

All of this starts to make sense when you read about how Perl is an analogy to natural languages. Most languages look at programs as machines. Perl looks at programs as novels. There's nothing wrong with either way; they're just different. In fact, this is exactly what makes Perl interesting.

The design of Perl seems to be that of a medium not for engineers, but for people who create works of art or works of literature.

One claimed benefit of its natural-language-ness is that you can learn as you go as there are many levels of acceptable competence. (Take my competence in English for example, using words like "natural-language-ness".) This results in there being many poorly-written Perl programs, and Perl advocates argue that the burden in creating well-written code is rightly put on the programmer, not the language designer. The alternative however, is that those incapable of writing beautiful code never even learn the language. Whether fortunately or not, a language will always be judged to a large extent by the programs written in it.

Dabbling in Haskell recently, I am definitely going to learn more Perl simply for the contrast, as trying different things leads to more patterns.

Thursday, September 27, 2007

Methods to the Aha

When I was a fresh CS major at CMU, I took the standard 15-251 Great Theoretical Ideas of Computer Science class. The professor Steven Rudich introduced us to some amazing topics. Even though I was never able to use it to its full potential, the most valuable thing taught was what he called the "aha! method". In other words, he taught that there was a method to getting to the point where a light bulb goes off in your head and you get an insight that allows you to solve a hard problem, making you say "aha!". And moreover, that this method can be learned.

The list he gave was as follows.
  • Phrase Hygiene
  • Representation
  • Induction
  • Modularity
  • Exemplification
  • Refinement
  • Abstraction
  • Bracketing
Briefly, he meant: tag assertions with phrases that indicate your degree of conviction; actively choose your representation of the problem; use the various forms of induction; break the problem into smaller subproblems; use small examples; don't give up after you've found an answer; abstract away the non-essential details of the problem; try to give upper and lower bounds and make them converge. I'm not going to go in to more details of each of them, but the overall idea is that each is a trick to help you more easily see the pattern in the problem that will lead you to a solution.1

As Rudich pointed out, in any field — martial arts, violin, tennis, magic, programming — the novice makes a huge motion, the black belt makes a small motion, and the master makes a tiny motion. The more complete the pattern you see, the less work is required to accomplish a task.

Recently, I've found a new method. What made me see this was an "interview question" I heard way back when I was in Rudich's class. It goes something like this... You have to cross a gorge that's 50 feet wide and 50 feet deep. All you have is two 20-foot ladders and all the rope you need. How do you do it?

And no, you can't go around it. What now? (You might want to actually think about this before reading on.)

Select text here for the spoiler... The answer is — since you have all the rope you need — that you fill the gorge with rope, and walk across.

Now, the answer to these riddles is often obvious in hindsight. But what can we learn from it? Why couldn't we see the answer before? It wasn't until recently that I realized that if you abstract away the details of this solution, what you had to do was substitute an abundance of one resource for the lack of another resource. And this I've discovered   is another method to the "aha!": Universal Substitution.

What I mean is that any resource can be substituted for any other resource. And this is simply amazing!

...As I mentioned in my last post, analogy seems to be a pattern of all of these. The Representation method above seems to hint at that. But how do you know which representation to choose? You have to already have made a connection — an analogy — with another problem.

Jeff Hawkins seems to think that all problems are solved through either memory or analogy. In other words, you either remember the solution to the problem if you've seen it before. Or you relate the problem to another problem and remember the solution to that, and then apply that solution to the specific situation at hand   which is something your mind does all the time.2

If this is true (which seems to make sense), then analogy is a problem-solving method. But not only that, it is a method to finding problem-solving methods! It is a method of solving any problem, even meta-problems and meta-meta-problems.3

So this leads us to our next question. How do you become good at creating analogies?

Analogies are connections between two different things. They are the result of abstracting something away from two things that are different and being left with things that are the same. Analogies are patterns.

One thing I've noticed that increases the likelihood of seeing patterns is trying different things — all sorts of things. Oftentimes you suddenly realize that things you've become so used to don't have to be the way they are. That it is not necessarily the right way or the wrong way. But also, when it's different enough, you can't help but see what isn't different. In other words, what remains constant — which is a pattern.

You never know where a pattern might show up, or what lines it might cross.4 Plus, oftentimes the analogies within a field have already been exhausted by others, while cross-discipline analogies lay un-reaped.

Also, there is the trap of becoming too close to something. After working on a problem arduously for a while, it is often helpful to first take a break, and then step back and look at the problem to get a bigger perspective. In other words, disentangle yourself from the details and take a look a the whole picture at once, just as if you were putting together a jigsaw puzzle.

All this really is, is another example of the law of Balance.

I've actually attempted to devise an algorithm for finding patterns before. The thing is, there are a seemingly infinite number of ways to categorize things. So if you try to categorize things and then look for something re-occurring, you'll never know if you're simply using the wrong categorization. That's why it's hard to learn something new in general. According to Jeff Hawkins, you have to see a pattern before you can learn anything. The people who tend to be seen as smart are the ones who pick up new things quickly, and they can do this because they see the patterns quicker than others. But it is a completely unconscious activity to them.

This begs the question though, is there another way of finding patterns? Can we find patterns without first categorizing things?

It's like if you have a hash function that abstracts away from things you're looking at, and you keep hashing different things hoping to find a collision. Once you do, you've found a pattern. For example, if your hash function took physical objects as input and gave a color as output, you could hash all the things on your desk and if any items resulted in the same color, you would have found a commonality between those two things, which is a simple yet significant pattern.

But what if you're looking for another pattern? Which hash function should you use? Is there another way besides this hashing method?

See also: Side-Stepping Obstacles in Space and Time
1. Try these out on this logic puzzle; they really work. The method can take you from being completely stuck to a point where you can work through the problem.
2. What Jeff Hawkins describes is abstraction and application. Sounds like functional programming to me. Lambda abstraction and function application. Is there anything more fundamental?
3. This really makes you wonder about things like Numenta, Jeff Hawkins' implementation of the brain which he calls hierarchical temporal memory. What problems will be solved once the size of a brain becomes limited by the amount of money someone is willing to throw at it, instead of by the human skull?
4. I'm not at all interested in sports, but recently I went to a baseball game with my brother, and I realized that anything — once you start optimizing, it becomes something so deep that you could spend an entire lifetime mastering it. Baseball. Cooking. Driving. Conversing. ...What then, is truly worth mastering?

Monday, September 17, 2007

Programming Is Writing

Programming is just like writing. Prose that is.

Specifically, I'm talking about essays — prose in which the purpose is to explain something. Which is exactly what a program does. It explains an algorithm.

Both programming and writing are constructive, as in the constructive logic sense. Writing a program is writing a proof, is writing an essay. In both, it is desirable to be clear and concise. What you write must be organized; the most straight-forward way is starting with a high-level overview, or a top-down organization. Both programs and prose must be read and understood sequentially. Sure you can jump around, but in both, each statement must follow from the previous one. And sub-points, or sub-goals, must flow from beginning to end to form the final point.

In prose, in addition to giving explanations, you give examples to get your point across. So why don't programming languages allow you to write examples as part of the source? In a sense, pattern-matching is exactly this but very limited. Is it not done simply because compilers are not good at inferring from examples? Even if this is the case, we shouldn't let it stop us. The purpose of a programming language is to allow an algorithm to be expressed, and secondarily to be executed. Now I know the economists out there will argue with me. And you're right. But I'm talking more from a design perspective.

Stories are also proofs — they are illustrations of {proof through action}. That's why theories often start from anecdotal evidence. Anecdotes are short story proofs. Everyone uses them all the time. Story is the universal proof-language. In other words, the universal programming language.

Now, I wouldn't want to write all my programs in this universal language for practical reasons. English for example, is not optimized for expressing algorithms. But it's universal in the sense that anyone can understand and relate to stories in the language — i.e. programs in the language — because they describe life experiences, which English is optimized for.

I was talking with Kyle the other day about how I and other programmers sometimes alienate people when we speak because we either have to clarify intuitively obvious ambiguity   or explain things by decomposing them into such detail that the explanation becomes incomprehensible to others. The other day, someone asked Kyle if the hat he was wearing was new. He replied by asking what the person meant by "new". From an intuitive perspective — the perspective of the person asking the question — Kyle should have answered simply "yes" or "no". But instead, the inquirer became slightly confused by the unexpected response, and they went back and forth going nowhere.

Kyle claims that programmers are better at recognizing puns because puns are instances of ambiguity, which programmers have been trained to spot. (I disagree though, because I find myself missing obvious puns, and I think this has to do with the seemingly under-developed right hemisphere of my brain, or at least the disconnectedness of the two hemispheres.) But that got me thinking... if programmers are good at spotting puns because puns are instances of ambiguity, wouldn't that make them bad at creating puns? Because I read recently in On Intelligence that part of what makes Shakespeare a genius in writing is how he can create a metaphor between two things and not actually be talking about either of them. I'm not an expert on Shakespeare, but I do know it's hard to use metaphors in writing. It's simply more abstract.

So if programming is just like writing, what would be the equivalent of a pun in programming?

A pun isn't merely ambiguity between two or more different meanings; it's actually double meaning. Triple meaning. Any or all of the meanings make sense.

So to have this in programming, for example, when a language has two namespaces: one for variables, and one for types; you would have to declare a name in both namespaces and find a way to state the name in such a way that both the variable and the type make sense.

Are there any languages out there that allow for such a thing? I'm not sure. But an obvious case of this would be for printing in a language that has type information available at runtime. You could write something like print foo where foo is defined as both a variable and a type. Which should it print? Obviously it would depend on the semantics of the language. But my point is, it could do both. Or it could look at you and smile coyly.

Like puns and the idea of including examples directly in the source, not separately in test cases, I believe there are more good ideas to be extracted from this analogy that programming is just like writing. Can anyone out there see any more? I'm working on a post about problem-solving methods that I hope to finish soon, but analogy seems to be an amazing problem-solving technique, if not a pattern of problem-solving techniques in itself. In other words, a meta-meta-solution.

Of Little Importance

I've been going crazy writing recently. I've got 7 other posts that I started this week that I haven't had a chance to finish. The reason I've been having so many ideas for posts is partly because I've been seeing so many patterns. So why have I been seeing so many patterns about programming?

I went to hang out with my crazy-cool cousins this weekend. The kind of cousins I only see a couple times a year, but every time it's a blast. And at some point I realized from a conversation that what I'm doing with programming — becoming a relative expert on it, focusing all my time on it, reading about it, writing about it — other people do all the time with sports, video games, movies, and other things. In other words, it's no different. No better. No worse. And the more I focus on programming, the more I will prosper at it. ...But at the same time, I won't be prospering at what's important in life.

This is the pattern of wasting your life, getting caught up in something with only relative value.

I recently saw a documentary called Flight from Death: The Quest for Immortality, and it made me realize that instead of trying to live a great life, if you live your life to make death acceptable, you will end up with a life greater than you can possibly imagine. And the phrase "it's a good day to live" will become synonymous with "it's a good day to die".

I don't believe it's chance that the times I've been on the edge of death were also the times I've felt most alive. It was a coincidence — a coinciding of events. In other words, a pattern. This also points to degrees of being alive — i.e. a life/death continuum — rather than the common conception of a binary switch.

So let's remember to focus on what's important.

Sunday, September 16, 2007

PL Feature Wish List

Here's my programming language feature wish list:
  • Parallelism using list comprehensions like nesl
  • Seamless parallelism over a network like erlang, with ability to extend with encryption/security
  • Turing-complete type-system like Qi, but with implicit runtime verification
  • Turing-complete macros like Lisp (i.e. extending the compiler), but maybe just the ability to have function parameters optionally evaluated which would give you the equivalent, assuming all computations that can be evaluated at compile-time are
  • Expose the interpreter/compiler for meta-circular evaluation
  • Arbitrary meta-data like Java annotations or .NET attributes on all source code nodes, not just functions.
  • Full source reflection at runtime, compile-time, anytime (although this might not be necessary if the IDE has scripting — see below)
  • In other words, blur distinction of compile/run/etc. times
  • Bi-directional mappings for implicit parsing and printing (i.e. syntax skinning) with ability to use them inline (which would, among other things, give you the usefulness of reader macros)
  • Seamless FFI and importing of Java libraries — we have to build on top of something so it might as well be the thing with the most well-developed libraries to make the language a viable option for production code
  • Seamless interaction with the native platform when needed
  • Infinite data structures like Haskell — I guess this means lazy evaluation
  • Seamless language-integrated persistence
  • Type inference with useful error messages when my programs fail to type-check
  • Namespaces — I've never seen a language sustain large applications without these, and many languages have failed to catch on
  • Modules and true functors (i.e. functions over modules)
  • Type-classes — they're really useful
  • User-definable monads
  • Examples/test cases in source
An IDE with:
  • Searching by function type-signature, which will also return functions that can be used with simple coercion functions on the parameters (e.g. using stringOfExp : exp -> string as an implicit coercion)
  • Graphing of module and function dependencies so I can see where the entry-points are when I am trying to look for something in a module I'm not familiar with
  • Profiling which feeds back into the editor (color-coded perhaps) so I can see where I should optimize when I'm editing
  • Scripting like Emacs but in the language that it edits, not a variant of Lisp or some other random language that the editor happens to be written in
  • Incremental background compilation with automatic dependency analysis
  • Knowledge of the abstract syntax tree to allow for (scripted) refactoring. In fact, why not define parts of the language (or skin) as mappings between UI events and semantics, taking the idea of bridging the gap between abstract syntax trees and the user even further.
  • Type inference that feeds back into the editor, as if you had typed it in
  • A graphical window designer like Visual Studio's Form Builder or Xcode's Interface Builder
  • Toggling of debug output which is structured, not just text, so that more information can be expanded without re-running. Toggling should be optionally automatic by tracking which pieces of code have changed recently and only showing dataflow affected by the changes.
A library with:
  • Good data-structures (e.g. hash tables, balanced trees, heaps, graphs, etc.)
  • Hash functions on common types and typed encryption algorithms (i.e. an 'a rsa_encrypted type) with an interface that makes them easy to use (i.e. no coding cost and little knowledge of encryption algorithms required)
  • A hardware/platform abstraction layer for every-day drawing using a box-model similar to CSS
Notice that just about everything here is agnostic to what the actual expressions of the language are.

Is this technically feasible? Absolutely. Just about all of it's been done before in one form or another.

Is this realistic? No. But no one ever made it anywhere interesting without dreaming big.

I'll add to this post's list when I think of something else.

Friday, September 14, 2007

Why Tagging Doesn't Work

It requires refactoring.

For example, when I bookmark something on delicious, I tag it with all the things I think are relevant to that site at the time. But the way I categorize sites and what things I believe to be important   change over time. So the only way I can search through my bookmarks to find all the things related to a certain topic that I wasn't even aware of before   is to linearly scan through all my bookmarks and re-tag them.

Is there a solution to this?

Wednesday, September 12, 2007

The Mind Is Strict

Don't think of a red ball. □

Monday, September 10, 2007

Birthday Patterns

Here's a cool experiment to try. Don't read ahead without actually doing this though or you'll ruin the results.

Choose a few people and categorize them. In one group, those who you consider to have a "down-to-earth look", and in the other group, those who you consider to be "hot". That's right. You must choose one. No one can be in both categories.

Who should you choose? Anyone you can find out their birthday or astrological sign easily. But it helps prevent bias if you don't already know their birthday. So use people you don't know their birthday but can easily find out, like a friend on Facebook or a celebrity.

For example, Jennifer Aniston: down-to-earth; Britney Spears: hot.

Once you've categorized the people in your list, look up their birthdays. Notice anything? A pattern?

If you don't see anything, try categorizing people whose birthday is in the first half of the year versus the second half of the year. Now do you see a pattern?

I've found this is scary-accurate. Almost to the point where it's a little creepy. I was playing this game with my friend who saw the pattern originally where we would pick a celebrity, guess if they were born in the first half of the year or the second based on how they look, and then look up their birthday. We found it worked much better for females, but this is probably due to the fact that we've been training for years judging females on looks. I suspect though that it works just as well for everyone as long as you have a trained eye.

When you start to think about this, the implications are crazy. Behavior and beauty — and as my friend theorizes, how suitable someone is as a spouse — are determined with high probability by your date of birth. I'd like to see skeptics of astrology try to explain this away.

I'm not very knowledgeable about astrology at all. I'm just beginning to notice that there really is some truth to it. We stumbled upon this little game last night after I looked at the birthdays list of my friends on Facebook. I noticed there was a disproportionately large number of birthdays in certain months. So I decided to count them up and graph them in Excel.

Wow! The results were stunning. Approximately double the median of birthdays per month appeared in a single month for my friends. The next highest month wasn't even close. What does that say about me? ...Apparently I've been inadvertently choosing friends based on birthday. Or your birthday really does correlate with your personality, of which I have a certain preference.

I know the skeptics still won't buy it. But that's okay. Those who see will prosper anyway. Because this is not just evidence of the pre-determination of your life, but also of the rule of Balance. There are rules to the Universe — patterns — and one of them is that you can't be both smart and beautiful, unless of course you're missing something else!

Friday, August 31, 2007

Static Vs. Dynamic Typing

Static vs. dynamic typing. Here's my ravings.

Being thoroughly trained in ML, I understand the benefits of static typing. Here are some claimed benefits.
  1. Reduced CPU usage since types don't need to be checked at runtime
  2. Reduced memory usage since types don't need to be stored at runtime
  3. Other type-directed optimizations
  4. Discovery of errors as early as possible, for all my code
  5. Static typing is more powerful/expressive since dynamic typing is a special case, or subset, of static typing where everything is tagged
Now let's look at the claimed benefits of dynamic typing.
  1. Easier to run and test since there are practically no compile-time or link-time errors
  2. Programs are not limited by the expressiveness of the type system of the language — e.g. heterogeneous data structures w/o explicit tagging
  3. Allows for implicit open recursion, late binding, dynamic scope, and duck typing
  4. Programs are simpler since I don't have to worry about types. Things are often coerced implicitly making my programs more concise, and I don't have to use really abstract things like type variables and higher-order types. In other words, they have been abstracted away by being pushed down to a place I don't have to think about.
  5. Faster prototyping due to not having to write down types which change often during prototyping
There are a few claimed benefits, however, which are not true. Allow me to refute.
Myth: Static typing is more powerful/expressive since dynamic typing is a special case, or subset, of static typing where everything is tagged.
It's true that dynamic typing is a special case of static typing. This can be seen by simply creating a tagged union where each branch is a different possible runtime type. Then make every expression in your language a value of this type. Voila — dynamic typing.1

However, it does not follow that this makes static typing more powerful. To see this more clearly, let's use a simpler example. C vs. assembly. C is more powerful than assembly. But there are things you can write in assembly that you simply can not write in C. Because you are restricted. Restricted by the definition of the C language which does not know anything about your cool new machine instruction to factor large numbers and break your gramma's closely-guarded RSA-encrypted banana nut bread recipes. However, anything you can write in C, you can obviously write in assembly. So C is a special case, or subset, of assembly.

In other words, the more powerful language is the one that is the subset.

And this does not depend on the specifics of the language. So in the case of dynamic vs. static typing, dynamic typing is actually more powerful. I alluded to this already by the fact that dynamic typing abstracts away types by pushing them down into the runtime system.
Myth: Dynamic typing allows for faster prototyping due to not having to write down types which change often during prototyping.
This myth stems from the belief that types in dynamically typed languages don't have to be written down at all, while types always have to be written in statically typed languages. And this is absolutely false!

First of all, even though you don't have to declare the types of variables and functions in dynamically typed languages, you still write down some types. Whenever you make a class, declare fields or methods of a class, or specify inheritance relationships, you are creating and/or specifying types.

Secondly, type information does not need to be specified in statically typed languages that include type inference. Many languages with type inference like ML and Haskell are specifically designed so that you don't have to write down any types. The only exceptions are enumeration-like data structures (i.e. tagged unions) which have runtime values. So the difference between the amount of type information you must write depends greatly on the language itself, not whether that language has static or dynamic typing.

So we are down to the following for static typing.
  1. Reduced CPU usage since types don't need to be checked at runtime
  2. Reduced memory usage since types don't need to be stored at runtime
  3. Other type-directed optimizations
  4. Discovery of errors as early as possible, for all my code
  5. Static typing is more powerful/expressive since dynamic typing is a special case, or subset, of static typing where everything is tagged
And for dynamic typing.
  1. Easier to run and test since there are practically no compile-time or link-time errors
  2. Programs are not limited by the expressiveness of the type system of the language — e.g. heterogeneous data structures w/o explicit tagging
  3. Allows for implicit open recursion, late binding, dynamic scope, and duck typing
  4. Programs are simpler since I don't have to worry about types. Things are often coerced implicitly making my programs more concise, and I don't have to use really abstract things like type variables and higher-order types. In other words, they have been abstracted away by being pushed down to a place I don't have to think about.
  5. Faster prototyping due to not having to write down types which change often during prototyping
Some people have counterarguments as to why these are not real benefits. I will go through some of them.
Myth: The cost of runtime type information in speed or memory is not significant, especially with today's hardware.
In general keyboard-speed applications, the cost of runtime type information is not significant. But in core loops, it can be huge. Because it is not just a matter of checking the types, which itself can be very costly in a loop. But the fact that you must store everything as a tagged value means that everything must be stored as a pair in memory. This means that in dynamically typed languages, there is no way to specify the computation of adding the values in two registers and storing the result in another register. The add operation in a dynamically typed language must load a tag from memory, conditionally branch on the value of the tag, load a 2nd tag from memory, conditionally branch on the value of that tag, load the first number from memory, load the 2nd number from memory, add the numbers, store the result in memory, and then store a new tag. This is ignoring the case where the runtime system has to allocate memory for this new object, as the statically typed version may spill registers, so call it even.

For such a simple operation though, this is a huge cost.
Myth: Type-checking is a substitute for testing.
Again, absolutely not true. Although some people may use type-checking (static in particular) for testing program correctness, this is neither its intended purpose nor its limit of use. Type-checking is more about ensuring meaningful, well-defined execution than ensuring correctness. Even though you may hear strong static-typing advocates say, "When your program type-checks, you'll often find that it just works", this is simply not true for large, intricate programs. Although type-checking may help you find errors, it is not the same as testing. Thus, it is not a suitable substitute for testing.

Static type-checking will however discover one entire class (or a few classes) of errors at compile-time, early in the development cycle — something that is dependent upon executing every code path in a dynamically-typed language, which is simply infeasible.

Other Issues

Open Recursion, Late Binding, Dynamic Scope

Open recursion and late binding are just special cases of dynamic scope. It's not clear whether having these features is a good thing or not. They make it easy to introduce bugs when extending code in a way that was originally not intended (which happens fairly often). They can definitely be useful though since they allow you to do things that can't be done easily without the features. But as I explained earlier, more flexibility does not imply power. In fact it's the opposite, as long as the constraints are well-designed. Except in bottlenecks where optimization is crucial, Duff's device is a bad feature for a language to support. And although you can implement open recursion in a statically typed language, you can't implement dynamic scope in general (and be type-safe) without converting your entire program to use tagged values, which would be implementing the entirety of dynamic typing. So in this respect, dynamic typing is actually more flexible.

Some may argue that the language should not constrain the programmer. Let him decide whether or not to use it. After all, exceptions are a special case of dynamic scoping, and many statically-typed languages support this. The problem is that you will inevitably be using or extending someone else's code. And the person who wrote that code will inevitably make a judgment call that you consider to be poor, making the code non-obvious and unnecessarily complicated from your perspective — which can really be said about any powerful language feature, so maybe it's just me. But this one can make it especially difficult to reason about your programs. ...Again, it's not completely clear whether these features are good or bad.

Type-Directed Optimizations

If you care about performance, which you will for just about any production app, type-directed optimizations can help you a great deal. I won't go into the details of how they work, but the basic idea is that the compiler uses type information to optimize your code. Theoretically, you could have a compiler for a dynamically-typed language that performed the same optimizations. However, to do this at compile-time, it would have to collect the same information as a compiler for a statically-typed language. I've never seen a compiler able to do this in general, but such a hybrid is the future of typing.

An alternative is for the compiler to do the optimizations at run-time — i.e. just-in-time.

The problem with static typing is that there are always more type-directed optimizations that would be nice to do is always more type-expressiveness that would be nice to add, and so more and more complicated type systems emerge, oftentimes complicating a language with quirks such as ML's value restriction. On the other hand, if dynamic typing were used, the problem causing such complication would disappear. A simple solution is often the best.

Types in the Source Code

Dynamic-typing advocates claim that writing down types is a hindrance, especially during prototyping. This stems from the assumption that you have to do additional work to think of and change the types of the expressions you're dealing with.

I definitely understand the criticism of writing down types because I don't usually write down loop invariants, so I could see that as being "extra work". But types are no different. Both types and loop invariants are required to create the program, but not to execute the program.

For anyone who's become proficient in a language with a powerful type system, he knows that in order to even write a meaningful program, you have to use the correct types. And so when you're writing the program, you implicitly (and unconsciously for some) give a type to every single expression you write. So why not write down all this extra information that was used to create the program so that it can be used not only by the compiler, but by you when you come back to your code after a week. The same goes for invariants — loop invariants in particular — but I'll save that for another post on theorem provers!

Knowing what I know about theorem-proving and loop invariants, I believe writing down the types is the right thing. However, in a prototyping situation where the types are changing faster than you can type them, you shouldn't have to write them down. That's where type-inference comes in. But as the system begins to stabilize — as some parts which are getting pushed down in the hierarchy do — you should write the types down, because at that point you know what the types should be, and the extra information can be used by the compiler to check and optimize, and by those reading your code to understand your program's interfaces and semantics.

Typing of the Future

Here's the typing hybrid I see as the future, as the answer is never on either end of a spectrum, but a balance somewhere in the middle — a pattern of the Universe.
  1. The flexibility to omit the vast majority of types, but consequently,
  2. a significant amount of type-inference that uses
  3. a powerful type system that is so expressive that it is itself Turing-complete, which requires
  4. blurring the lines between compile-time and runtime to the point where the programmer can not always tell (and should not always care) whether static or dynamic typing is being used by the compiler.
And we will get there by pushing on.
1. This is where the "un(i)typed language" joke comes from. A language that is "untyped" really has exactly one type for all expressions.

Tuesday, August 14, 2007

You Can Only See...

It's been a re-occurring idea recently that you can only see what you already know. How then   do you ever get to see the things you don't know?

It seems you just have to force yourself. You have to push yourself and trudge through it. Just dive in even though you don't feel comfortable with it. And force yourself to use, and thus learn, the new thing. It's always hard at first, but after repeated use, you start to get more comfortable and develop an intuition.

After reading On Intelligence by Jeff Hawkins, it became obvious to me that not only do genetic memories exist, but I have them and they control me all the time. Reading that book was the trigger that allowed me to understand something new. And now I can see the concept of genetic memories. Something that was not obvious to me   now became obvious.

The interesting thing is that this is not just for technology or scientific theories. But also artistic, social, or spiritual matters. For example, why is it that many believe in astrology, miracles, or the existence of a greater power (a.k.a. a god) while others explain it away in terms of chance, scientific laws, or something else trivial and uninteresting?

Due to my numerous run-ins recently with this idea, I'm beginning to think these are just more examples of things that can only be seen if you already get the underlying concepts that make them up.

On a related note, reading The Neverending Story made me realize it's possible to create something that always existed.

Thursday, August 2, 2007

A Strange Medium

Code is a very strange medium. Because of compilers, constructing a program is practically free once you unambiguously design it in code. But the design is constantly changing. It's really like building a skyscraper. So many intricate details, all working together. If any one piece fails, it could cause the whole building to collapse. But somehow it all must integrate to form a coherent whole.

The difference with code though, is that you're always extending it. With a building, the architect designs it once, perfects it, and then sends it off to be constructed. Sure it may come back to him during construction for modifications due to changes in materials or budget. The client may even say, "I want it to have 2 more floors." Fine. Add in the 2 more floors. Work in the new materials, et cetera. But once it's built, it's built.

With programs, there are similar milestones in releases, but there is never an end to the changes and additions of features. Moreover, after you've been working with something for a while, you begin to see patterns. And it makes sense to factor those patterns out — to abstract them away and push them down into a lower level. Or, you may even see a way of redesigning that is so fundamental, that the only sensible way of doing it would be to redo the specification completely.

Because over time, your understanding both of the problem and the solution changes.

The Hard Test

It's recently come to my attention that there are smart, capable programmers out there who are resistant to learning and/or using more powerful programming languages.

I have to admit, in hindsight, syntax can be a huge deterrent. The first thing I used to do when I saw Lisp's parentheses was try to parse them like parentheses in other languages — basically implementing a stack in my head. This is extremely difficult. And the parens just make the code ugly. Something like Haskell on the other hand, I look at that and it's just beautiful.

When I started coding in Scheme and mentioned to someone more familiar with it that I was having difficulty with the parens, he pointed out the fact that you don't have to parse the parentheses in Lisp or Scheme. Just use an editor like Emacs that automatically indents properly based on the parens, and simply look at the indentation.

Oh. That makes things easier.

But besides the whole high learning curve, Lisp in general is very abstract. And thinking abstractly is hard. Now, people don't like doing things that are hard. Why would anyone want to do something that was hard when they could do it an easier way. And this is exactly the reason why most people avoid more abstract programming languages. However, that very abstractness is what makes them so powerful. As long as people believe "anything that can be done in language X, I can do in my language", they will never advance. So many people fall into this trap because of the Turing-machine idea of expressiveness.

But programming isn't only about expression, it's about understanding.

It's a shame though that people who are smart enough to work with abstract tools shy away from them because they are hard. Due to the fact that they are hard to use, the majority of people can't use them to their full potential. Therefore, anyone who has the ability can move ahead of the pack.

The fact that something is hard is never a good reason to not do it. In fact, it's often a reason to do it. If I'm faced with a choice where I seem unable to decide, I ask myself, which option is harder. Because that is probably the right choice. I'm probably only considering the other one to avoid the difficulty in the right choice.

Wednesday, August 1, 2007

ML → Scheme

I wrote my first real Scheme macro today. It was a macro to emulate ML's pattern-matching.

Writing a type-checker in Scheme, I found myself writing all these list?, eq?, and let* expressions, making my code look way too complicated than necessary. All of that can be done simply and concisely with pattern-matching. For Lisp hackers out there that aren't familiar with pattern-matching, it's basically a destructuring-bind where you can have constants or symbols in your binding expression which are compared with the value you are binding to, sort of like a cond where the predicates are implicitly specified in the pattern.

The great thing about pattern-matching is that code to process data looks just like the data you are processing. So it makes it extremely easy to look at code that uses pattern-matching and see what it's doing, as opposed to equivalent code that uses explicit conditionals and lets.

Without pattern-matching:
(cond ((and (list? exp) (eq? (first exp) 'typ-arrow)) (let ((param-typ (second exp)) (ret-typ (third exp))) (do-stuff param-typ ret-typ)) ((and (list? exp) (eq? (first exp) 'typ-union)) (let ((branch-typs (second exp))) (do-stuff-to-branches branch-typs))))
(match exp (('typ-arrow param-typ ret-typ) (do-stuff param-typ ret-typ) (('typ-union branch-typs) (do-stuff-to-branches branch-typs)))

Anyway, when I first started using Scheme, I didn't like the prefix notation and the paren-syntax — or lack of syntax. Compiler writers out there, you know what I mean. ;-) Frankly, I don't know anyone who does like it at first. I thought, I'd rather write code in a more familiar way, then switch to prefixed constructors when I needed to create code like I would in ML. For example, the SML/NJ compiler exposes the abstract syntax as a datatype, so you can do just this.

However, this means that there are 2 representations of code in the language. Two. ...And that doesn't seem right.

After working with macros (I finally had the conceptual break-through of figuring out how to use double backquotes — ick!), I realized, this makes sense. The idea that code and data have the exact same concrete representation makes total sense. So much so that I believe this is the right way to do it.

But one question that comes to mind is, why does that one representation have to be prefix notation with parentheses everywhere?

It's not clear whether being able to write data in the same form as the regular parsed grammar (infixes and all) would be a good thing or not.

One thing I've since learned, but had trouble with when I first switched from ML to Scheme, was that in Scheme, the meaning of parentheses is overloaded. In ML, parentheses are used for disambiguation of parsing, and that's all. This is almost strictly so in OCaml. But in Scheme, not only is it used for specifying the parse in lists and sub-lists in quoted expressions, but also for function application. This confused the hell out of me for a little while, and my first inclination was to avoid using quote (i.e. ') and use list instead. But I soon got over that.

Overall, my experience learning Scheme has been extremely rewarding, as a great way to recognize patterns in something is to experience something completely different from it. And many decisions that were made in designing ML, the opposite choices were taken for Scheme or Lisp.

Seeing ML Clearly

I went to school at Carnegie Mellon. And there, the computer science department is big on research. For system-level programming, they use C. But for almost all theoretical or type related topics, they use ML. In particular, Standard ML.

So since I was really interested in compilers and type-theory, I became very familiar with ML. First how to use it. Then how to use it to make interpreters. How compilers work. And eventually, how to compile ML code. A relative expert in ML.

While at CMU, I was thoroughly trained in the benefits of strongly typed languages, the pitfalls of weakly typed languages, and why static typing can result in more efficient code than dynamic typing. I was also introduced to the idea of typed intermediate languages — that compilers have multiple phases which translate code to entirely different languages, each of which is strongly typed, getting closer and closer to the target language after each phase. In other words SourceLang => IL1 => IL2 => ... => ILn => TargetLang. And after I got the hang of it, I thought ML was great! Oh, how I became to loath writing code in other languages. Look! Look how easy and beautiful the code would be in ML.

But recently, for the first time, I've had a real reason to write some code in Scheme. Scheme is similar to ML in that it's functional. But it's dynamically typed. Moreover, some flavors have "features" in them like dynamic scope that make it very difficult to look at a piece of code and determine whether it will compute gracefully or result in an error. One of the biggest benefits of static typing is that it reveals errors in your programs as early as possible — at compile time. Dynamic typing on the other hand, reveals errors as late as possible. If a branch of code is never taken, you'll never know whether that piece of code will fail, possibly until you ship your code and your users break it, losing millions (even lives) in the process.

So all through school, I was on one side. I was very very close with ML. But now that I've been using Scheme (and also toying with Qi [pdf]), I've been on the other side. And now, for the first time ever, I can judge ML for what it truly is. And here's what I've found.

I still think ML's a great language. Early bug detection and the invariants that are captured in types are so utterly essential to writing correct code that I can't believe it's still being done the other way. I literally can't go writing more than a couple Scheme functions before I have to write down the types in comments, because otherwise, I have to try to hold all this stuff in my head. It's not something that is in addition to writing the code; types are something I implicitly think about when I create code. When I look at a piece of code, to see if it makes sense I type-check it in my head, in the same way that I execute it in my head when debugging for example.

However, I've realized there are a few very powerful abstractions that are missing from ML (or lacking).
  1. Macros
  2. Reader Macros
  3. Unrestricted Execution of Types
...Maybe I'll go into the details later.

Saturday, July 28, 2007

Fitting Inadequate Tools

As an after-thought to yesterday's post, I remembered Kyle (another person who prefers Emacs over Eclipse) showing me to double loop iterator variable names as in ii instead of i or jj instead of j. This allows you to search for loop iterator variables without results coming up everywhere the letter "i" is used in your file.

What he was basically suggesting was that I change the code I write in order to accommodate the inadequacies of a tool, namely, textual searching.

This does not make sense to me at all. If I had a tool that understood the scoping rules of my language (e.g. Eclipse editing Java), I could select a variable and the tool would show me all the references to that variable, regardless of its name. Granted, this doesn't help if you're searching for all the loops that use the variable name, but (again, maybe this is just me) this is not something I ever do. And if I needed to, a regex would handle that (or find whole word).

Changing your work, no matter how slight, to fit your tools is an indicator that you need a better tool. Because basically, you've found a pattern. And whenever you find a pattern, you've found an opportunity for improvement. Changing your work to fit your tools is equivalent to optimization. You should only do it if you have to. In other words, if it is a bottleneck. And there's two ways to optimize for a bottleneck: from the top down, or from the bottom up. Doing it from the top down means you must always think about it. Doing it from the bottom up, abstracting the optimization away in a tool, means you don't have to think about it anymore. It is equivalent to pushing a process down to a lower level in the hierarchy. And the top level is the only level you have to think about.

...This brings up an interesting question. What is above me in the hierarchy? as it would be silly to think I was at the very top.

Friday, July 27, 2007

Tool Power and Why I Use Eclipse

I was talking with Kyle the other day about how with programming languages, when you look up at languages more powerful than the ones you know, all you see is what you already know and the supposedly more powerful languages seem not so great. However, when you look down at less powerful languages, it's obvious that they are less powerful. I'm not talking about computational power, as all Turing-complete languages can express all the same programs. It's more about abstractions. Paul Graham describes this more in depth calling it the Blub Paradox.

That is all very well and interesting in itself. But if you accept it, you must also accept its implications. Namely, it makes sense to learn and use more powerful languages.

But it's not just about languages. It's about tools. Yesterday at work, I had a discussion with two co-workers about the differences between Emacs and Eclipse as IDEs. Them both being advocates of Emacs, I figured I could get the lowdown on it from them since I only use it occasionally and am not an expert. For writing Java (unfortunately, yes, I must do this at my current job), I prefer Eclipse. But I'm open to using other more productive tools. So I thought to myself, if Emacs is so great, maybe I can find out from one of these guys why, and then perhaps I'll switch.

Basically, their argument came down to 2 things. First of all, Emacs can be more easily extended than Eclipse which has a clumsy plugin system. And secondly, only Emacs allows you to re-map the key bindings which is extremely helpful for writing code, which is where you spend the majority of your effort compared to other tasks you'd use an IDE for.

However, there's one feature of Eclipse in particular (there are others too, but I'll not go into them) that Emacs does not have and neither of the guys I talked to knew of a standard extension out there that already does this. It is the code refactoring features — specifically, renaming variables, classes, and packages.

I admit, any programmer is going to spend 10 times as much of his time on writing code. Renaming things is a rare task comparatively. However, when it must be done in Emacs or editors that don't support this, you must use regex replacing. And this is both tedious and error-prone. Textual find and replace will always be error-prone since it does not take into account scoping rules of the language.

The fact that Eclipse allows you to do this correctly every time means that the cost of renaming drops to practically zero. This means you no longer have to avoid it, which in this case means you no longer have to plan ahead to try to avoid it in the future. But refactoring and renaming is inevitable. In fact, the more often you refactor the better because it prevents code complexity and messiness from creeping in. But say you wanted to try to think of the right name from the beginning to try to prevent having to refactor it later. It is impossible to know the correct name for a thing when you create it because code is fluid — it never stops changing. Requirements change, goals change, scope changes. And so too must the code. Moreover, as you design a solution to a problem, your understanding of the problem changes. So even if the actual problem doesn't change, your understanding of how to most efficiently solve that problem will change. Thus, refactoring including renaming is absolutely inevitable to keep the code as close as possible to the model in your mind.

So why should I use an IDE that is oblivious to such things, forcing me to think about more things than I have to. The details which are solely a result of code being expressed in text.

It's true that such a feature could probably be made as an Emacs extension. I do not doubt this at all. But the fact that it is not already done means that like Lisp over C, or C over machine-code, Eclipse is more powerful than Emacs. And this is obvious looking down the power continuum.

...And as for the whole key bindings thing, personally I almost never find my keystroke rate as the bottleneck. Writing code is usually limited by my conceptual understanding and the translation of ideas into the language I'm writing in. But maybe that's just me.

So I don't think the guys I had this conversation with got this out of it, but because of this conversation, I realized that Eclipse is a more powerful tool than Emacs (for editing Java), and I don't plan on switching any time soon.

Post Post: Since writing this I've realized I've fallen into the same trap as others. Namely, the trap of not seeing the power of something you don't understand. I've done the exact same thing that users of less powerful tools and languages do when they look at something more powerful. I've disregarded the importance of re-mapping key bindings w/o actually getting to know them. It's possible this is a huge gain. This doesn't change the fact that Eclipse is more powerful in other respects. So it is really a value-judgment of mine that the cost of typing slowly is less than having to refactor things manually. But I won't know for sure until I learn and use the more powerful features of Emacs.