Writing an HTML parser is an elucidating exercise. The half-million words in the specification dig up some fascinating historic trivia. It changes you. At least, it’s taught me that in almost every discussion I see online about HTML, practically nobody knows what they are talking about (and before I started this journey I sure didn’t either). Here are some of the most common and incorrect claims I observe which are based on an incomplete understanding of HTML. While some of them are benign and have their own humorous note in the story of the Web, some are actively harmful to improving the shared reliability and interchange of documents on the Internet.
In fairness, this sentiment was largely true twenty years ago and earlier. In 2008, however, this chapter ended with the formalization of previously-undefined behaviors in the HTML5 living specification.
In the early days, competing browsers built competing parsers and each one attempted to correct errors in unique ways. It was precisely because browsers were attempting to recover as much as they could from invalid documents and because no formalization existed for those error cases that led to the divergence in behaviors.