SIMDized check which bytes are in a set

submited by
Style Pass
2024-12-24 23:00:14

The problem is defined as follows: there's a stream of bytes and we want to get a byte-mask (or a bit-mask) that indicates which bytes are in the predefined set.

Thanks to SIMD instructions this task can be performed faster than scalar code. Jobs like input validation or parsing (for instance CSV files), might benefit from a vectorized approach.

The main ingredient of the techniques shown below is instruction pshufb (_mm_shuffle_epi8), which is present in SSE, AVX2 and also AVX512BW. The instruction does parallel byte lookup in a 16-byte register (or lane, in AVX2 and AVX512 variants) using 4-bit indices from another vector.

In this algorithm the set is represented as a bitmap of size 16 x 16 bits, where a bit of value 1 indicates that given element is in the set. The bitmap is addressed with nibbles, i.e. 4-bit halves of a byte.

The lower nibble of each input byte selects the bitmap's row, i.e. a 16-bit value. The higher nibble selects the column on the bitmap — once we fetched a row row, the higher nibble points at specific bit.

Leave a Comment