Word Embedding using Universal Sentence Encoder in Python
Unlike the word embedding techniques in which you represent word into vectors, in Sentence Embeddings entire sentence or text along with its semantics information is mapped into vectors of real numbers. This technique makes it possible to understand and process useful information of an entire text, which can then be used in understanding the context or meaning of the sentence in a better way.
In this article, you will learn about how to create vectors for a complete sentence using Universal Sentence Encoder.
Let’s consider two sentences: –
- How old are you?
- What is your age?
The above two sentences are similar in meaning i.e. we are trying to ask the person’s age. In the above two sentences, individual words and their vectors will not give a good insight into what a complete sentence is trying to convey, nor they will be able to classify if these two sentences are similar or not. So in such scenarios Sentence embeddings perform better than word embeddings.
There are various Sentence embeddings techniques like Doc2Vec, SentenceBERT, Universal Sentence Encoder, etc.
Universal Sentence Encoder
Universal Sentence Encoder encodes entire sentence or text into vectors of real numbers that can be used for clustering, sentence similarity, text classification, and other Natural language processing (NLP) tasks. The pre-trained model is available here under Apache-2.0 License. The pre-trained model is trained on greater than word length text, sentences, phrases, paragraphs, etc using a deep averaging network (DAN) encoder.
Implementation of sentence embeddings using Universal Sentence Encoder:
Run these command before running the code in your terminal to install the necessary libraries.
pip install “tensorflow>=2.0.0”
pip install –upgrade tensorflow-hub
[[-0.06045125 -0.00204541 0.02656925 … 0.00764413 -0.02669661
[-0.08415682 -0.08687923 0.03446117 … -0.01439389 -0.04546221
[ 0.0816019 -0.01570276 -0.05659245 … -0.07133699 0.11040762
[-0.00369539 0.03064634 -0.05556112 … 0.01751423 0.0316496
-0.05139377]], shape=(4, 512), dtype=float32)
The above output represents input sentences into their corresponding vectors using the Universal Sentence encoder.