Idea: Provide additional spatial information to embedding vectors
Embedding vectors represent the semantic meaning of words, encapsulating aspects like syntax and context within a given corpus.
Positional encodings, on the other hand, provide spatial (positional) information, indicating where each word is in a sequence
When you add these positional encodings to the embeddings, you're essentially enriching the embeddings with information about word order.
The scales of positional encodings and word embeddings are typically designed to be compatible:
not too large to over-dominate tge embeddings content
not too small to become insignificant
When embeddings and PE are added, the dominant signal remains the semantic content from the embeddings, with positional information providing a subtle, yet important, secondary influence.
During training, the model learns to extract and use the positional information along with the semantic content.