1001 Representations of Syntax with Binding

submited by
Style Pass
2024-05-14 21:30:17

As a compiler developer or programming language researcher, one very common question is how to represent the syntax of a programming language in order to interpret, compile, analyze, optimize, and/or transform it.

One of the first lessons you learn is to not represent syntax as the literal string of characters written by the programmer, but rather convert it to an abstract syntax tree (AST). This has a number of advantages: it enforces structure on programs, makes the syntax easier to traverse and transform, erases irrelevant details (e.g. whitespace and comments), and allows for better separation of different intermediate states of the syntax.

However, syntax is in fact not a tree but a graph: names (of variables, functions, classes, modules, …) can point to potentially far-away locations in the program’s text. The way these work is that one occurrence of the name binds the name (aka the binder), and other occurrences of the same name mention the name, thus pointing to the binder.

Just as representing syntax as a string is not a good idea for most purposes, representing names as strings is usually not a good idea either. This will be clear to anyone who has ever implemented capture-avoiding substitution for lambda-terms with strings as variable names (and if you haven’t, I invite you to do so). Hence we would like to have a generic and universal way to represent the structure of binders-and-mentions, similar to how abstract syntax trees represent the structure of syntactical-constructs-and-their-components.

Leave a Comment