Caught by the fuzz! jsoup 1.14.2 is out now, and includes a set of parser bug fixes and improvements for handling rough HTML and XML, as identified by the Jazzer JVM fuzzer. This release also includes other fixes and improvements.
jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors.
Guided fuzzing is a testing method that, starting from a defined corpus, generates millions of different input files, using the instrumented codebase to steer the test harness. It attempts to adversarially create input content that leads to slow performance or unexpected exceptions. This approach finds areas in the parser that can be improved, leading to a faster and more robust implementation of jsoup.
This testing has identified particular content that could result in longer than usual parse times, or could result in unexpected exceptions including Stack Overflow, Null Pointer, and Index out of Bounds exceptions. Depending on how the parser was used, that could potentially contribute to denial of service attacks. Versions of jsoup before 1.14.2 are susceptible. We recommend that all users upgrade to this new version.