If you were to guess the missing part of the sentence — “My name John Doe” what would you guess? Most people would answer this with an “is.” (A few interesting breeds would go for a “was” as well) You essentially guess a word that has the highest probability of being there. Interestingly enough, machines also work the same way. However, they don’t understand these words in their textual form. These words need to be converted into numbers for a machine to understand. How do we do it? The answer is Word embeddings.
Hi, how are you? Hello, how is it going? In this context, “Hi” and “Hello” carry the same meaning. If we were to organize words based on their semantic roles, “Hi” and “Hello” would naturally be grouped together.
This semantic similarity can be represented mathematically as well. If somehow plotted in an n-dimensional space, these words would be plotted closer to each other.