Igalia Compilers Team

submited by
Style Pass
2024-05-08 18:30:05

MessageFormat 2.0 is a new standard for expressing translatable messages in user interfaces, both in Web applications and other software. Last week, my work on implementing MessageFormat 2.0 in C++ was released in ICU 75, the latest release of the International Components for Unicode library.

As a compiler engineer, when I learned about MessageFormat 2.0, I began to see it as a programming language, albeit an unconventional one. The message formatter is analogous to an interpreter for a programming language. I was surprised to discover a programming language hiding in this unexpected place. Understanding my surprise requires some context: first of all, what "messages" are.

Over the past 40 to 50 years, user interfaces (UIs) have grown increasingly complex. As interfaces have become more interactive and dynamic, the process of making them accessible in a multitude of natural languages has increased in complexity as well. Internationalization (i18n) refers to this general practice, while the process of localization (l10n) refers to the act of modifying a specific system for a specific natural language.

Localization of user interfaces (both command-line and graphical) began by translating strings embedded in code that implements UIs. Those strings are called "messages". For example, consider a text-based adventure game: messages like "It is pitch black. You are likely to be eaten by a grue." might appear anywhere in the code. As a slight improvement, the messages could all be stored in a separate "resource file" that is selected based on the user's locale. The ad hoc approaches to translating these messages and integrating them into code didn't scale well. In the late 1980s, C introduced a gettext() function into glibc, which was never standardized but was widely adopted. This function primarily provided string replacement functionality. While it was limited, it inspired the work that followed. Microsoft and Apple operating systems had more powerful i18n support during this time, but that's beyond the scope of this post.

Leave a Comment