Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add text embedding serving #206

Closed
sonic182 opened this issue Apr 24, 2023 · 5 comments · Fixed by #214
Closed

Add text embedding serving #206

sonic182 opened this issue Apr 24, 2023 · 5 comments · Fixed by #214

Comments

@sonic182
Copy link

sonic182 commented Apr 24, 2023

Is there a way of obtaining embedding from text? for example, to extract the 768 dim's from a given text using bert model

something similar to this python example (transformer and torch dependencies)

from transformers import AutoTokenizer, AutoModel
import torch

# Load pre-trained model tokenizer and model weights
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModel.from_pretrained("bert-base-uncased")

# Tokenize input text
text = "Hello, world!"
tokens = tokenizer.encode(text, add_special_tokens=True, return_tensors="pt")

# Generate model embeddings
with torch.no_grad():
    embeddings = model(tokens)[0].squeeze(0)  # Remove batch dimension

# Print the embeddings for the first token
print(embeddings[0])
@trodrigu
Copy link

#100 may have some things that can help

@trodrigu
Copy link

@jonatanklosko
Copy link
Member

Hey @sonic182, here's code that matches your Python transformers example:

{:ok, model_info} = Bumblebee.load_model({:hf, "bert-base-uncased"}, architecture: :base)
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "bert-base-uncased"})

text = "Hello, world!"
inputs = Bumblebee.apply_tokenizer(tokenizer, text)

Axon.predict(model_info.model, model_info.params, inputs).hidden_state[0]

@trodrigu thanks for the reference :)

It would make sense to have a serving pipeline to streamline this use case, so I will keep this open :)

@jonatanklosko jonatanklosko changed the title Get embeddings from text Add text embedding serving Apr 27, 2023
@rakshans1
Copy link

from transformers import CLIPModel, CLIPProcessor
import torch

model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")

text = "Hello, world!"
inputs = processor(text=text, return_tensors="pt")
model.get_text_features(**inputs)

Is it possible to build servings for get_text_feature / get_image_features?

@jonatanklosko
Copy link
Member

@rakshans1 you can do both text and image the same way as the snippet above (featurizer/tokenizer + running base model). And yeah, we will have serving for both text and image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants