submited by

Style Pass

In my last blog entry LL and LR Parsing Demystified, we explored LL and LR parsers from a black-box perspective. We arrived at a model for these parsers where both their input and output were streams of tokens, with the parser inserting rules as appropriate according to Polish and Reverse Polish notation.

In future articles I want to focus in even closer on the details of LL and LR algorithms, but I realized that I should first zoom out and give some motivation for why anyone should care about LL or LR to begin with.

As I wrote this article, it turned into an answer to the question “why is parsing hard?” Or alternatively “why doesn’t everybody use parser generators?” LL and LR parsing theory is taught in in books like Compilers: Principles, Techniques, and Tools (known as “The Dragon Book” and used in many university compilers courses), but then people graduate to find that most parsers in the real world don’t work like this. What gives? This article is my answer to that question.

The theory of LL and LR parsing is almost 50 years old: Knuth’s paper On the Translation of Languages from Left to Right that first defined LR(k) was published in 1965. This is only one of an incredible number of mathematically-oriented papers about parsing and language theory. Over the last 50 years academics have explored the mathematical dimensions of parsing with great vigor, but the field is nowhere near exhausted; even in the last five years we’ve seen some entirely new and important results published. One of the best surveys of the field is the book Parsing Techniques: A Practical Guide, whose bibliography contains over 1700 cited papers!

Read more blog.reverbe...