Last time (way back in June! I have got to start blogging consistently again) I discussed a gorgeous data structure called the Wavelet Tree. When a Wa

FM-Indexes and Backwards Search

submited by
Style Pass
2024-04-29 03:00:15

Last time (way back in June! I have got to start blogging consistently again) I discussed a gorgeous data structure called the Wavelet Tree. When a Wavelet Tree is stored using RRR sequences, it can answer rank and select operations in $\mathcal{O}(\log{A})$ time, where A is the size of the alphabet. If the size of the alphabet is $2$, we could just use RRR by itself, which answers rank and select in $\mathcal{O}(1)$ time for binary strings. RRR also compresses the binary strings, and hence compresses a Wavelet Tree which is stored using RRR.

So far so good, but I suspect rank and select queries seem to be of limited use right now (although once you are familiar with the available structures, applications show up often). One of the neatest uses of rank that I’ve seen is in substring search, which is certainly a wide reaching problem (for a very recent application to genome assembly, see Jared Simpson’s paper from 2010 called Efficient construction of an assembly string graph using the FM-index).

There is a variety of Suffix Array construction algorithms, including some $\mathcal{O}(N)$ ones (Puglisi et al. 2007). However, I will explain it from the most common (and intuitive) angle.

Leave a Comment