Experiments with NLP and GPT-3

submited by
Style Pass
2023-01-27 17:00:08

Semantic neural search has been around for a long time. As soon as we figured out that we can capture the essence of a sentence using sentence embeddings, we immediately realized that we can store these embeddings in a database and then query these embeddings to find “similar meaning” sentences. The sentences need not have common words. As long as the meaning was similar we can find them.

Immediately people figured out that we can apply this for document search. But with the advent of ChatGPT and exposure of this technology to new people, lots of innovative ideas are cropping up.

So I thought it might be a good idea to list down the steps if you want to create a similar application utilizing semantic search.

Here, we are assuming we have already scraped the website or got the content of a book in a text file. First we need to extract the sentences using segmentation. We can use libraries like https://stanfordnlp.github.io/stanza/tokenize.html. We can also use a basic segmentation with rules like parsing using a full stop “.”

Once you have the sentences, we need to get the embeddings for these sentences. We can use Sentence Transformers for getting the embeddings. In our case, we used our own embeddings which provide a 40-50 times compression of embeddings provided by sentence transformers.

Leave a Comment