Generate Llama 2 from Embeddings #72

liechtym · 2024-01-08T16:31:28Z

Compiling and loading Llama 2 in Neuron is working great for me on a inf2.8xlarge with the new release 2.16.

However, I have a unique use case where I need to be able to input embeddings directly into Llama 2 instead of token ids. I need to be able to generate the embeddings, modify the embeddings, and then use the embeddings for generation. I was already able to generate the embeddings separately via llama_model.chkpt_model.model.embed_tokens(token_ids). However, I'm not seeing a way to plug those embeddings into the model once I've modified them.

It seems to me that LlamaForSampling.sample() (from transformers_neuronx.llama.model) probably can't do this (correct me if I'm wrong). I got TypeError: sample() got an unexpected keyword argument 'inputs_embeds' when I tried.

So, I tried using the HuggingFaceGenerationModelAdapter from transformers_neuronx.generation_utils to enable using the generation API as was performed on this GP2 example. However, there was an error that prevented that, which I filed an issue for in the tranfomers repo.

What is the best way to go about doing this? I really appreciate your help.

The text was updated successfully, but these errors were encountered:

liechtym · 2024-01-10T14:17:18Z

In transformers repo they said the HuggingFaceGenerationModelAdapter incompatibility error is probably stemming from the tranfomers-neuronx wrapper. Any help with this?

Here is the error:

Traceback (most recent call last):
  File "modular.py", line 107, in <module>
    chatbot = MiniGPT4LLama2Chatbot(cfg_path, gpu_id)
  File "modular.py", line 62, in __init__
    self.model = model_cls.from_config(model_config)
  File "/home/ubuntu/MiniGPT-4/minigpt4/models/minigpt4.py", line 173, in from_config
    model = cls(
  File "/home/ubuntu/MiniGPT-4/minigpt4/models/minigpt4.py", line 45, in __init__
    super().__init__(
  File "/home/ubuntu/MiniGPT-4/minigpt4/models/minigpt_base.py", line 43, in __init__
    self.llama_model, self.llama_tokenizer = self.init_llm(
  File "/home/ubuntu/MiniGPT-4/minigpt4/models/base_model.py", line 202, in init_llm
    llama_model = HuggingFaceGenerationModelAdapter(llama_model_cpu.config, llama_model_neuron)
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/generation_utils.py", line 18, in __init__
    super().__init__(config)
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1190, in __init__
    config = self._autoset_attn_implementation(
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1311, in _autoset_attn_implementation
    config = cls._check_and_enable_sdpa(
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1464, in _check_and_enable_sdpa
    raise ValueError(
ValueError: HuggingFaceGenerationModelAdapter does not support an attention implementation through torch.nn.functional.scaled_dot_product_attention yet. Please open an issue on GitHub to request support for this architecture: https://github.com/huggingface/transformers/issues/new

See more details on the issue page: huggingface/transformers#28396.

Of course my general goal is to simply get this working with input embeddings so if this is not the right route, let me know.

shebbur-aws · 2024-01-10T19:52:33Z

Hi @liechtym , We do not have support for external embeddings. One way you could potentially get around this is by replacing the model embedding weights directly. Please let us know if that helps.

liechtym · 2024-01-11T14:00:14Z

@shebbur-aws Thanks for your reply. A workaround is totally fine for me. Would you be able to give a quick explanation/example for how to replace the embedding weights and run the forward pass on the rest of the model?

liechtym · 2024-01-16T21:57:36Z

Could I get help on this @shebbur-aws ?

davidshtian · 2024-04-16T10:18:05Z

@liechtym @shebbur-aws Hi~ I've got the same situation here, do you have any resolution or workaround on this? Input embeds as model input parameter instead of input ids. Thanks~

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate Llama 2 from Embeddings #72

Generate Llama 2 from Embeddings #72

liechtym commented Jan 8, 2024

liechtym commented Jan 10, 2024 •

edited

Loading

shebbur-aws commented Jan 10, 2024

liechtym commented Jan 11, 2024

liechtym commented Jan 16, 2024

davidshtian commented Apr 16, 2024

Generate Llama 2 from Embeddings #72

Generate Llama 2 from Embeddings #72

Comments

liechtym commented Jan 8, 2024

liechtym commented Jan 10, 2024 • edited Loading

shebbur-aws commented Jan 10, 2024

liechtym commented Jan 11, 2024

liechtym commented Jan 16, 2024

davidshtian commented Apr 16, 2024

liechtym commented Jan 10, 2024 •

edited

Loading