-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate Llama 2 from Embeddings #72
Comments
In transformers repo they said the Here is the error:
See more details on the issue page: huggingface/transformers#28396. Of course my general goal is to simply get this working with input embeddings so if this is not the right route, let me know. |
Hi @liechtym , We do not have support for external embeddings. One way you could potentially get around this is by replacing the model embedding weights directly. Please let us know if that helps. |
@shebbur-aws Thanks for your reply. A workaround is totally fine for me. Would you be able to give a quick explanation/example for how to replace the embedding weights and run the forward pass on the rest of the model? |
Could I get help on this @shebbur-aws ? |
@liechtym @shebbur-aws Hi~ I've got the same situation here, do you have any resolution or workaround on this? Input embeds as model input parameter instead of input ids. Thanks~ |
Compiling and loading Llama 2 in Neuron is working great for me on a
inf2.8xlarge
with the new release2.16
.However, I have a unique use case where I need to be able to input embeddings directly into Llama 2 instead of token ids. I need to be able to generate the embeddings, modify the embeddings, and then use the embeddings for generation. I was already able to generate the embeddings separately via
llama_model.chkpt_model.model.embed_tokens(token_ids)
. However, I'm not seeing a way to plug those embeddings into the model once I've modified them.It seems to me that
LlamaForSampling.sample()
(fromtransformers_neuronx.llama.model
) probably can't do this (correct me if I'm wrong). I gotTypeError: sample() got an unexpected keyword argument 'inputs_embeds'
when I tried.So, I tried using the
HuggingFaceGenerationModelAdapter
fromtransformers_neuronx.generation_utils
to enable using the generation API as was performed on this GP2 example. However, there was an error that prevented that, which I filed an issue for in the tranfomers repo.What is the best way to go about doing this? I really appreciate your help.
The text was updated successfully, but these errors were encountered: