Experiments about a better locate using grep - a3nm's blog

submited by
Style Pass
2023-03-15 23:30:04

I have a lot of files and I'm not fond of sorting them intelligently, so I usually give them long and descriptive file names and rely on locate to find them quickly. I was always a bit annoyed to see that locate was not instantaneous even though intuition suggested that it should be, but I was shocked to find out that locate is actually an antique, and that reimplementing a better version of it seems trivial. So I thought I had to investigate more, to replace locate with something that worked better.

I am currently using locate, by which I mean mlocate, not slocate which is not packaged for Debian, and not the old locate from findutils.

My database in /var/lib/mlocate/mlocate.db takes 300 MB, and indexes about 10M files. Indexing is run every night, I don't know how much time it takes, mlocate is maybe smart about which files to examine, but I don't really care about that. (To be honest, I am a bit concerned about power consumption and hard drive and SSD wear, but that's rather hard to estimate.) Hence, I will focus on the performance of query evaluation only, not indexing. To estimate performance for typical use cases I grepped from my history file to find the latest locate commands that I ran:

That's 650 invocations (over a bit over 100k lines of history). I filtered them to remove those that did not run, and to remove the very rare cases where I used a locate-specific feature (there was one -0 and a few cases where I was searching simultaneously on multiple patterns, otherwise the only flag was -i), so that's exactly 646 queries (of which 527 are unique). Let's time all those commands, running it 3 times to avoid outliers, with no outstanding IO or CPU activity:

Leave a Comment