submited by

Style Pass

Thanks to programming interviews, we have memorized that a binary search performs O(log n) comparisons, while a linear search performs O(n), so a binary search is going to be faster. Unfortunately, this ignores the constant factors. Linear search is the same or slightly faster for arrays of less than about 100 integers, since it is simpler than a binary search. This ignores the cost of sorting the array, so the advantage could be slightly larger for real programs. My conclusion is that if you only have a tiny number of elements (say less than ~30), it probably doesn't really matter what data structure you use, and an old fashioned array is a good choice due to simplicity and memory efficiency. For larger data sets the algorithmic advantages are huge and you need to choose carefully.

I was recently reminded of this fact when I was browsing the ChangeLog for the STX B-Tree, a C++ in-memory B-Tree. The first line was "changing find_lower() to not use binary search for small node sizes." After investigating, it was changed because it improved performance. I wasted a lot of time during my PhD tweaking an in-memory tree and had "discovered" the same optimization. I remember writing some tiny benchmark and being surprised that it remained true for fairly large values of N. I decided to revisit this and wrote a quick-and-dirty benchmark. The short summary is that up to about 100 integers, the linear search is better or competitive. This is amazing if you consider the comparisons. With 100 elements, the linear search performs on average 50 comparisons, while the binary search performs only 6 or 7, so it is doing about 10X more "work" in the same amount of time.

Read more evan-jones.a...