Exploring parsing APIs: what to generate, and how

submited by
Style Pass
2024-11-24 04:30:02

When parsing a language like this, a common first step if to define an “abstract syntax tree” (AST), with only the details we want from the parser output.

This type is called an “abstract” syntax tree because it abstracts the unnecessary details from the parse output. In our tool we don’t need locations of nodes and comments, so the AST doesn’t contain them.

It’s easy to implement a parser for this AST: we iterate the input, skip whitespace and comments, then based on the next character decide what type of node (integer, string, etc.) to parse. For nested Json nodes in arrays and objects, we recursively call the parser.

A JSON formatter: a formatter needs to know about comments to be able to keep them in the formatted code. To support this use case, the AST needs to include comments too, which will make it larger, and parsing will be less efficient for the applications that don’t need comments.

A configuration file parser for a text editor: to be able to show error locations in configuration errors (such as an invalid value used for a setting), the AST will have to include source locations. Similar to above, this will make the AST larger and slower to parse for other applications that don’t need source locations.

Leave a Comment