As a software analysis company we have an inherent need for parsing source code. Until recently we used the maybe most prominent tool for parsing Java

TreeSitter - the holy grail of parsing source code

submited by
Style Pass
2023-03-17 12:00:02

As a software analysis company we have an inherent need for parsing source code. Until recently we used the maybe most prominent tool for parsing Java code which is JavaParser. JavaParser works exceptionally well and certainly paid its dues for us. However, JavaParser became more and more a bottleneck in terms of performance. Not necessarily because the parser itself is slow, but because we had to perform expensive external Java calls from Go to execute JavaParser. To be able to keep delivering a snappy UX with more features to come, we had to say our farewells to JavaParser and welcome a new parsing library.

As already indicated above, the main requirement for a replacement was the ability to call the new parser natively within our Go code base. Our final decision fell on TreeSitter, written in C and available in Go via the package smacker/go-tree-sitter which provides bindings using cgo. With the migration from JavaParser to TreeSitter we gained a speedup by a factor of 36x for parsing source code in our parser benchmarks 🤯.

The speedup is a perfect start for the journey of optimizing our analysis pipeline towards real time compatibility. However, it is not only the performance which steered us towards TreeSitter, but also various other properties and features which give us an advantage in the mid and long-term.

Leave a Comment