Ludicrous BERT Search Speeds

submited by
Style Pass
2021-07-01 01:00:06

Semantic search applications have two core ingredients: Text documents represented as vector embeddings that capture meaning, and a search algorithm to retrieve semantically similar items.

BERT is a popular word embedding model because its pretrained versions provide high-quality embeddings without any additional training. However, those embeddings are high-dimensional (768 in our experiment), making it unbearably slow to search with k-nearest-neighbor (KNN) algorithms.

Fortunately, there are ways to speed things up considerably. One article shows how to search through BERT (or any other) embeddings progressively faster in Elasticsearch, starting with native, then with the Elastiknn plugin, then with Approximate k-NN in Open Distro for Elasticsearch, and finally with a GSI Associative Processing Unit (APU).

In the end, the author accelerated a million-document search from 1,500ms with Elasticsearch to 93ms with the GSI APU. An improvement of 6.1x — not bad! Not to mention the reduced index sizes and indexing times.

Leave a Comment