abseil / Performance Tip of the Week #21: Improving the efficiency of your regular expressions

submited by
Style Pass
2023-11-20 09:00:04

Regular expressions are used, misused and abused nearly everywhere. Google is no exception, alas, and at our scale, even a simple change can save hundreds or thousands of cores. In this tip, we describe ways to use RE2 more efficiently.

NOTE: This tip is specifically about RE2 and C++. A number of the ideas below are universally applicable, but discussion of other libraries and other languages is out of scope.

As a prelude, let’s consider an example of how regular expressions are often used. This snippet looks for a zone ID at the end of the zone_name string and extracts its value into the zone_id integer:

This tip describes several techniques for improving efficiency in situations such as this. These fall into two broad categories: improving the code that uses regular expressions; and improving the regular expressions themselves.

In order to understand why the following techniques matter, we need to talk briefly about RE2 objects. In the initial example, we passed a pattern string to RE2::FullMatch(). Passing a pattern string instead of an RE2 object implicitly constructs a temporary RE2 object. During construction, RE2 parses the pattern string to a syntax tree and compiles the syntax tree to an automaton. Depending on the complexity of the regular expression, construction can require a lot of CPU time and can build an automaton that will have a large memory footprint.

Leave a Comment
Related Posts