On Homoiconicity

Sep 27, 2022

Recently I wanted to use the word homoiconic to describe a design feature. I have my own idea of what meaning is conveyed by the word, but decided I should probably look up modern usage to see what meaning others might take from it.

I encountered a lot of arguments, and a lot of concern over whether the term was even meaningful.

From Wiki ?DefinitionOfHomoiconic :

There is some debate about what exactly consitutes homoiconicity, and about which languages are homoiconic

Unfortunately this page does not exist on the current Wiki as of September 2022.

Questions of the following type abound on forums, which I paraphrase:

Is homoiconicity well-defined? Is the word mathematically rigorous? Is there a test that can be applied to show that Lisp is homoiconic but C isn't?

To approach this question at face value, every definition I've seen attempting to apply homoiconicity as a formal property of a language focuses on "the internal representation of code", that is, they are concerned with the implementation. If we were to try to mathematically formalize this, perhaps we could apply the definition then to a formal semantics.

Every approach to formalizing the semantics of C that I have seen certainly doesn't take an approach that could be described as homoiconic. One would expect that a natural understanding of the semantics of C would discuss memory and addresses, rather than something that looks like C expressions. Indeed, any language based more on procedures than on expressions, simply does not have in mind the concept of homoiconicity as a design feature.

Both K&C and the ANSI C standard themselves are informal (mathematically) natural language descriptions, whereas the specifications for e.g. SML and Scheme provide formal semantics. The Scheme specification, in particular, provide a fairly homoiconic formal semantics in terms of mathematical objects that look very much directly like Scheme expressions.

However, I doubt that formally rigorous mathematical definitions will actually help much here, or provide better meaning or understanding to programmers.

For example, a very practical and meaningful distinction between the code-as-data properties of Lisp, versus those of C, can be noticed in the following observation:

You can write a Scheme interpreter in about 20 lines of Scheme.

You'd be hard-pressed to write a C interpreter in less than 20,000 lines of C.

This is an observation that is directly relevant to a programmer.

While it would certainly be of mathematical interest to provide some sort of formalism that could help mathematicians to study this, the observation is clearly meaningful and valid whether or not mathematicians have provided a formalism to help explain it.

Most formal mathematical definitions would not notice a difference between 20 lines and 20,000 lines, so a completely naive approach would be likely to fail, although I would not bet the house against there being some more subtle way to formally tease out the difference. I would not expect this to work well for C itself, because again: C doesn't have a clearly suitable formal semantics.

As a design concept

The approach of asking whether a language "has homoiconicity", or has not, as in some sort of desirable marketing term to attract users, doesn't seem particularly fruitful. To my work as a programmer, the concept of homoiconicity has been more productively understood as a design concept.

Consider the design of the UNIX shell: one runs programs in the shell by typing commands, but that same sequence of commands can itself be packaged into a program itself as a shell script. That is, there is homoiconicity between user interaction with programs, and programs themselves.

The "everything is a file" principle similarly provides uniformity of representation between numerous other concepts, e.g. between devices and device drivers.

It is then the advantages that homoiconicity can provide which should be weighed: extensibility, automatic extension of new features across old programs without needing to rewrite the old, etc. Homoiconicity, that is, the technique of choosing a uniform representation for objects across otherwise distinct contexts, can then be considered, among other techniques, as a means of realizing these advantages.

History of the term

Thinking of homoiconicity as a design concept, rather than as a formal property of a language, is at least not without historical basis. Here we see the first recorded use of the term in programming, by Moores and L. Peter Deutsch, describing their TRAC® language¹ in the 1965 ACM Proceedings:

One of the main design goals was that the input script of TRAC (what is typed in by the user) should be identical to the text which guides the internal action of the TRAC processor.

Because TRAC procedures and text have the same representation inside and outside the processor, the term homo-iconic is applicable, from homo meaning the same, and icon meaning representation.

We can see first of all that the term is used as a design concept, and that they refer primarily to implementation, rather than formal properties of the language itself.

They attribute the terminology to the suggestion of Warren S. McCulloch, after C.S. Peirce, and cite LISP as an inspiration:

Lisp, although elegant in concept, becomes inelegant in practice. It even "cheats" in the frequent use of machine-language procedures. ... Were the S-language the only LISP language, LISP would be close to being homo-iconic (excluding the machine-language functions).

Incidentally, TRAC® language (with the registered trademark symbol) is one of the three beginner languages recommended by Ted Nelson in Computer Lib, after BASIC and before APL. ↩

tagged in design lisp unix