Llama 2 Transfomers Neuron X issue #28396

liechtym · 2024-01-08T15:50:11Z

I was trying to use the generate API for Llama 2 using the same code from this example:
https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/transformers-neuronx/transformers-neuronx-developer-guide.html#features

My code:

from transformers_neuronx.llama.model import LlamaForSampling
from transformers_neuronx.generation_utils import HuggingFaceGenerationModelAdapter

llama_model_cpu = LlamaForCausalLM.from_pretrained(
    'meta-llama/Llama-2-7b-chat-hf',
    torch_dtype=torch.float16,
)


llama_model_neuron = LlamaForSampling.from_pretrained('/home/ubuntu/Llama-2-7b-chat-hf-split', batch_size=1, tp_degree=2, amp='f16')
llama_model_neuron.to_neuron()
print('Config: ', llama_model_cpu.config)
llama_model = HuggingFaceGenerationModelAdapter(llama_model_cpu.config, llama_model_neuron)

Error:

Traceback (most recent call last):
  File "modular.py", line 107, in <module>
    chatbot = MiniGPT4LLama2Chatbot(cfg_path, gpu_id)
  File "modular.py", line 62, in __init__
    self.model = model_cls.from_config(model_config)
  File "/home/ubuntu/MiniGPT-4/minigpt4/models/minigpt4.py", line 173, in from_config
    model = cls(
  File "/home/ubuntu/MiniGPT-4/minigpt4/models/minigpt4.py", line 45, in __init__
    super().__init__(
  File "/home/ubuntu/MiniGPT-4/minigpt4/models/minigpt_base.py", line 43, in __init__
    self.llama_model, self.llama_tokenizer = self.init_llm(
  File "/home/ubuntu/MiniGPT-4/minigpt4/models/base_model.py", line 202, in init_llm
    llama_model = HuggingFaceGenerationModelAdapter(llama_model_cpu.config, llama_model_neuron)
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/generation_utils.py", line 18, in __init__
    super().__init__(config)
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1190, in __init__
    config = self._autoset_attn_implementation(
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1311, in _autoset_attn_implementation
    config = cls._check_and_enable_sdpa(
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1464, in _check_and_enable_sdpa
    raise ValueError(
ValueError: HuggingFaceGenerationModelAdapter does not support an attention implementation through torch.nn.functional.scaled_dot_product_attention yet. Please open an issue on GitHub to request support for this architecture: https://github.com/huggingface/transformers/issues/new

Is there a work around for this? Or is supporting this attention implemention the only way? I simply want to use the generate api with a neuron-compiled model.

The text was updated successfully, but these errors were encountered:

ArthurZucker · 2024-01-09T15:33:40Z

Sorry I don't think this is related to transformers as there is a wrapper around it. sdpa is natively supported in transformers

github-actions · 2024-02-08T08:03:25Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

liechtym mentioned this issue Jan 8, 2024

Generate Llama 2 from Embeddings aws-neuron/transformers-neuronx#72

Open

github-actions bot closed this as completed Feb 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama 2 Transfomers Neuron X issue #28396

Llama 2 Transfomers Neuron X issue #28396

liechtym commented Jan 8, 2024

ArthurZucker commented Jan 9, 2024

github-actions bot commented Feb 8, 2024

Llama 2 Transfomers Neuron X issue #28396

Llama 2 Transfomers Neuron X issue #28396

Comments

liechtym commented Jan 8, 2024

ArthurZucker commented Jan 9, 2024

github-actions bot commented Feb 8, 2024