Semantic Textual Search with Vector Embeddings

submited by
Style Pass
2021-06-11 04:00:06

The goal is to create a search application that retrieves news articles based on short description queries (e.g., article titles). To achieve that, we will store vector representations of the articles in Pinecone’s index. These vectors and their proximity capture semantic relations. Nearby vectors indicate similar content, and contents from faraway vectors are dissimilar.

Semantic textual search is a technique used for solving other text-based applications. For example, our deduplication, question-answering and personalized article recommendation demos use semantic textual search.

We will define two separate sub-indexes using Pinecone’s namespace feature. One for indexing articles by content, and the other by title. At query time, we will return an aggregation of the results from the content and title indexes.

We will use an Average Word Embeddings Model to create both title and content embeddings. Pinecone allows you to create paritions in the index that we call namespaces. This will allow us to maintain separate embeddings for the data that can be used for different tasks.

Leave a Comment