Add text embedding serving #206

sonic182 · 2023-04-24T09:07:36Z

Is there a way of obtaining embedding from text? for example, to extract the 768 dim's from a given text using bert model

something similar to this python example (transformer and torch dependencies)

from transformers import AutoTokenizer, AutoModel
import torch

# Load pre-trained model tokenizer and model weights
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModel.from_pretrained("bert-base-uncased")

# Tokenize input text
text = "Hello, world!"
tokens = tokenizer.encode(text, add_special_tokens=True, return_tensors="pt")

# Generate model embeddings
with torch.no_grad():
    embeddings = model(tokens)[0].squeeze(0)  # Remove batch dimension

# Print the embeddings for the first token
print(embeddings[0])

trodrigu · 2023-04-25T19:26:52Z

#100 may have some things that can help

trodrigu · 2023-04-25T22:35:12Z

https://github.com/trodrigu/text_embeddings_elixir/ as an example

jonatanklosko · 2023-04-27T14:55:32Z

Hey @sonic182, here's code that matches your Python transformers example:

{:ok, model_info} = Bumblebee.load_model({:hf, "bert-base-uncased"}, architecture: :base)
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "bert-base-uncased"})

text = "Hello, world!"
inputs = Bumblebee.apply_tokenizer(tokenizer, text)

Axon.predict(model_info.model, model_info.params, inputs).hidden_state[0]

@trodrigu thanks for the reference :)

It would make sense to have a serving pipeline to streamline this use case, so I will keep this open :)

rakshans1 · 2023-04-27T17:52:59Z

from transformers import CLIPModel, CLIPProcessor
import torch

model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")

text = "Hello, world!"
inputs = processor(text=text, return_tensors="pt")
model.get_text_features(**inputs)

Is it possible to build servings for get_text_feature / get_image_features?

jonatanklosko · 2023-04-27T19:05:55Z

@rakshans1 you can do both text and image the same way as the snippet above (featurizer/tokenizer + running base model). And yeah, we will have serving for both text and image.

jonatanklosko changed the title ~~Get embeddings from text~~ Add text embedding serving Apr 27, 2023

coderrg mentioned this issue May 26, 2023

Add text embedding serving #214

Merged

jonatanklosko closed this as completed in #214 Jun 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add text embedding serving #206

Add text embedding serving #206

sonic182 commented Apr 24, 2023 •

edited

Loading

trodrigu commented Apr 25, 2023

trodrigu commented Apr 25, 2023

jonatanklosko commented Apr 27, 2023

rakshans1 commented Apr 27, 2023

jonatanklosko commented Apr 27, 2023

Add text embedding serving #206

Add text embedding serving #206

Comments

sonic182 commented Apr 24, 2023 • edited Loading

trodrigu commented Apr 25, 2023

trodrigu commented Apr 25, 2023

jonatanklosko commented Apr 27, 2023

rakshans1 commented Apr 27, 2023

jonatanklosko commented Apr 27, 2023

sonic182 commented Apr 24, 2023 •

edited

Loading