Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama 2 Transfomers Neuron X issue #28396

Closed
liechtym opened this issue Jan 8, 2024 · 2 comments
Closed

Llama 2 Transfomers Neuron X issue #28396

liechtym opened this issue Jan 8, 2024 · 2 comments

Comments

@liechtym
Copy link

liechtym commented Jan 8, 2024

I was trying to use the generate API for Llama 2 using the same code from this example:
https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/transformers-neuronx/transformers-neuronx-developer-guide.html#features

My code:

from transformers_neuronx.llama.model import LlamaForSampling
from transformers_neuronx.generation_utils import HuggingFaceGenerationModelAdapter

llama_model_cpu = LlamaForCausalLM.from_pretrained(
    'meta-llama/Llama-2-7b-chat-hf',
    torch_dtype=torch.float16,
)


llama_model_neuron = LlamaForSampling.from_pretrained('/home/ubuntu/Llama-2-7b-chat-hf-split', batch_size=1, tp_degree=2, amp='f16')
llama_model_neuron.to_neuron()
print('Config: ', llama_model_cpu.config)
llama_model = HuggingFaceGenerationModelAdapter(llama_model_cpu.config, llama_model_neuron)

Error:

Traceback (most recent call last):
  File "modular.py", line 107, in <module>
    chatbot = MiniGPT4LLama2Chatbot(cfg_path, gpu_id)
  File "modular.py", line 62, in __init__
    self.model = model_cls.from_config(model_config)
  File "/home/ubuntu/MiniGPT-4/minigpt4/models/minigpt4.py", line 173, in from_config
    model = cls(
  File "/home/ubuntu/MiniGPT-4/minigpt4/models/minigpt4.py", line 45, in __init__
    super().__init__(
  File "/home/ubuntu/MiniGPT-4/minigpt4/models/minigpt_base.py", line 43, in __init__
    self.llama_model, self.llama_tokenizer = self.init_llm(
  File "/home/ubuntu/MiniGPT-4/minigpt4/models/base_model.py", line 202, in init_llm
    llama_model = HuggingFaceGenerationModelAdapter(llama_model_cpu.config, llama_model_neuron)
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/generation_utils.py", line 18, in __init__
    super().__init__(config)
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1190, in __init__
    config = self._autoset_attn_implementation(
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1311, in _autoset_attn_implementation
    config = cls._check_and_enable_sdpa(
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1464, in _check_and_enable_sdpa
    raise ValueError(
ValueError: HuggingFaceGenerationModelAdapter does not support an attention implementation through torch.nn.functional.scaled_dot_product_attention yet. Please open an issue on GitHub to request support for this architecture: https://github.com/huggingface/transformers/issues/new

Is there a work around for this? Or is supporting this attention implemention the only way? I simply want to use the generate api with a neuron-compiled model.

@ArthurZucker
Copy link
Collaborator

Sorry I don't think this is related to transformers as there is a wrapper around it. sdpa is natively supported in transformers

Copy link

github-actions bot commented Feb 8, 2024

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants